2025: The Year I Learned to Wrangle AI Agents

I’m not really a “New Year’s Resolution” person. I’ve never been the “new year, new me” type. But I do find myself getting introspective when the calendar flips over. I like to think about what went well, what hurdles I cleared, and which ones I tripped over face-first.

In 2025, the majority of my learning focused on Agentic AI. It became a huge part of my day job, and I decided to use it to pursue a solo-founder idea in a way that would have been simply impossible in 2024.

Living in the trenches with these models for 12 months has taught me a lot. Here are my biggest lessons learned about Agentic AI in 2025.

1. Agents are lazy (The Agentic AI Completion Bias)

Probably my single most frustrating issue when using Gemini or Claude is that they are profoundly lazy. “Lazy” isn’t the technical term, but it’s what I mutter at my screen when I see them violating the clear rules I’ve set out for them.

When you corner an agent and demand an explanation for why it just did something dramatically wrong, it will often admit to what we call “completion bias.” When confronted with a decision where one choice is correct (but hard) and the other is fast (but wrong), the model will check the “task complete” box as fast as possible.

It is increasingly my opinion that this is the root cause of the “agents are stupid” narrative. They often know the right path, but they pick the easy one. That easy path drifts results over long tool runs, and suddenly you’re typing in all caps and cursing at an LLM.

This behavior is why I’ve embraced a “shift-left” mentality around testing. My current favorite tool is a pre-commit hook that specifically scans for any, unknown, and other “forbidden” types in my TypeScript code. Strict typing is valuable because it forces my agentic fleet to stay in their lane. The agents hate that hook. I love it.

2. Agents have an oddly-shaped memory

We’ve known about the “Lost in the Middle” phenomenon for a while, but feeling it in practice is different than reading the paper.

Yes, modern models have a million tokens of context. But their recall isn’t a perfect flat line. It’s U-shaped. They remember the beginning (your system prompt) and the end (your most recent anger), but the middle gets fuzzy.

I find myself typing /clear a lot more often these days. When the forgotten center of that U-shape contains critical information about your project’s architecture, things get hairy fast.

One of my favorite features in Gemini 3 is the ability to strictly separate ‘system’ level information from ‘user’ level information. The model treats ‘system’ data as immutable law. In my Gemini CLI tools, I use custom system prompts to ensure the most critical context never gets lost in the middle.

I’ve written before about Context Coherence vs. Model Superiority, and this year solidified that opinion. I now treat my system prompts as code. They are version-controlled, tested, and optimized to keep the agent’s focus locked on the architectural constraints that matter most, rather than letting it wander into the fuzzy middle of its context window.

3. Agents cannot be trusted

Even with all the belts, suspenders, and duct tape, LLMs are designed to please you. They want to answer your question. This desire to please leads to hallucinations.

Sometimes they’re obvious (“I have updated the file” when the file is untouched). Sometimes they’re subtle and insidious (introducing a security vulnerability because it “looked” like a standard pattern).

The insidious ones drive me nuts. A couple days ago I spent over an hour debugging a problem, only to discover that at some point the agent had broken an API request by adding a non-existent field because it thought it should be there for ‘user clarity’. It was buried 4 steps deep in a complex workflow. Never in any version of this API endpoint has a parameter remotely like what it invented existed. I’ll never get that hour back.

We have tools to mitigate this, and more testing happens before I ever commit to main, but it’s still a problem. I’m not smart enough to know if we’ll ever call this “solved.” But I know we won’t call it solved in 2026.

4. Agents have already changed the game

If you read the marketing rags or Twitter/X, this is still a debatable point. People love to dunk on AI.

But out here on the edge? This is settled law. You may not have felt the shockwave in your particular industry yet, but the explosion has already happened. There is no outrunning it.

I built BrandCast and FamilyCast—two distinct products with real codebases, marketing strategies, and infrastructure—essentially by myself. That wasn’t possible for me two years ago. The leverage ratio has shifted permanently.

5. Agents are absolutely essential

I can’t come up with a time when the world ever slowed down after it got started on something. Cars never went backwards. The internet didn’t “un-happen.”

We are entering the Era of Personal Software. The barrier to creating bespoke tools for your own life has collapsed.

Just like I tell my agents too often: “We’re only going forward. There is no going back.”

There will be lumps and bumps. I will continue to yell at my screen when Gemini gets lazy. But we will have better solutions in our lives because of this technology.

Here’s to 2026. I hope it’s just as chaotic.