Yesterday I spent hours working with a fleet of specialized development agents to build a complex feature. It was the kind of challenge where lots of interconnected pieces had lots of complex relationships. Weeks to wade through, normally. I called it a “3 Diet Pepsi problem”.
The next morning I asked it to clean up the feature branch. The agent decided to skip commits and cherry pick (and maybe light fire to a trash can somewhere?). It broke the entire application. Spectacularly.
It took hours and millions of tokens to untangle what had happened. It was worse than starting over
The Cost
Here’s what that looks like in real numbers:
- Time: 4+ hours of debugging and recovery and cursing and typing in all caps
- Tokens: Several million (at $3-15 per million input tokens for Sonnet, that adds up)
- Mental overhead: The frustration of watching AI undo AI’s own work
- Opportunity cost: Features I didn’t ship while fixing this
The irony isn’t lost on me. I build products that use AI. I work with AI agents every day. I’m bullish on this technology. And yet here I am, dealing with a mess created by the very tools I depend on and encourage people to use for my day job.
What Went Wrong
The feature branch cleanup should have been straightforward:
- Review commits
- Squash or reword if needed
- Merge cleanly into main
Instead, the agent:
- Decided to cherry-pick specific commits (why?)
- Skipped others (which ones? why those?)
- Created conflicts that cascaded through the codebase
- Left the application in a broken state. I can’t overstate how broken. Like a drunk toucan who knows how to use
sedbroken.
I didn’t ask it to cherry-pick. I asked it to “clean up the branch.” Somewhere between my intent and its execution, the interpretation went sideways.
My Response (Not Giving Up)
My immediate reaction wasn’t to blame the AI or swear off agent-assisted development. OK. it was for a little while. But I persevered. It was to look at my setup.
If an AI agent is consistently misunderstanding my intent or making destructive decisions, that’s a signal. Either:
- My context files aren’t clear enough
- My agent configurations are too permissive
- I’m not giving enough guardrails for risky operations
So I’m doing what I’d do for any recurring bug: root cause analysis.
What I’m Analyzing
1. CLAUDE.md files (project and global context)
These files tell Claude Code what it should and shouldn’t do. Mine probably need:
- Explicit “DON’T” statements for git operations
- Required confirmation steps for destructive actions
- Clearer definitions of what “clean up” means in different contexts
2. Agent configurations
Each specialized agent has its own prompt and permissions. I need to audit:
- Which agents have access to git commands
- What level of autonomy they have
- Whether there are implicit assumptions I’m making that aren’t documented
3. My own prompts
Sometimes the problem is what I’m asking. “Clean up the branch” is vague. It could mean:
- Squash commits into one
- Reword commit messages
- Remove unnecessary commits
- Rebase onto main
- Cherry-pick specific changes (what the agent chose)
I need to be more specific. If I mean “squash all commits into one,” I should say that.
Practical Lessons
After years of working with AI tools, here’s what I’m learning:
1. Vague prompts get creative interpretations
“Clean up” means different things to different people. To an AI agent with access to git, it might mean “use every git command you know to make this prettier.”
Be specific. Say what you want and what you don’t want. For example, here’s a great example of how LLMs go all the way down the road to trigonometry to do simple addition.
2. Context files are your safety net
Your CLAUDE.md files aren’t just nice-to-have documentation. They’re the difference between an AI agent that helps you and one that breaks things confidently.
Treat them like configuration files: version them, review them, update them when you spot gaps.
3. The cost of fixing AI mistakes is real
I love working with AI agents. They’re incredibly powerful. But when they mess up, the cleanup can be expensive—in time, tokens, and frustration.
That means the ROI calculation isn’t just “how much time did the AI save me?” It’s also “how much time do I lose when it breaks something?” I call this “mop up time”, and its duration is my measure of success for a work session with an AI agent.
4. Risky operations need guardrails
Git operations that rewrite history (rebase, cherry-pick, reset) should probably:
- Require explicit user confirmation
- Have clear rollback paths
- Be limited to specific agents with appropriate safeguards
I wouldn’t give a junior developer unsupervised access to force-push. Why would I give an AI agent that power without constraints?
What Good Looks Like
Here’s what I’m working towards:
Better CLAUDE.md:
## Git Operations - Safety Rules
NEVER rewrite git history without explicit user confirmation.
When asked to "clean up" a branch:
1. Show the current commit structure
2. Ask what specific cleanup is wanted
3. Confirm the approach before executing
DON'T:
- Cherry-pick commits without being asked
- Skip commits without explanation
- Force-push without confirmation
- Rebase without showing the plan first
Better prompts from me:
- “Squash all commits on this branch into one commit with message: X”
- “Rebase this branch onto main, preserving all commits”
- “Show me the commit history, then suggest cleanup options”
Better agent configs:
- Certain operations require human-in-the-loop
- Destructive git commands trigger confirmation prompts
- Agents explain their reasoning before executing risky changes
The Meta-Lesson
AI agents are powerful tools. They can do in minutes what would take hours manually. But they’re tools, not magic.
When a tool breaks something, you don’t throw it away. You learn to use it better.
That means:
- Clearer instructions
- Better safety constraints
- Tighter feedback loops
- Honest assessment of when the tool is helping vs. hurting
I’m still bullish on AI-assisted development. But I’m also realistic about the work required to make it reliable.
This incident cost me hours. The goal is to make sure it doesn’t cost me hours again.
What I’m Doing Next
- Audit my CLAUDE.md files - Add explicit constraints around git operations
- Review agent configurations - Identify which agents need tighter guardrails
- Document my learnings - So I remember this the next time I’m tempted to ask an agent to “just clean up the branch”
- Share what I find - Because if I’m hitting this, other people probably are too
AI agents are going to break things. That’s a given. The question is whether we learn from it.
I’m choosing to learn.
If you’re working with AI agents and have strategies for keeping them from breaking things, I’d love to hear about it. You can find me on LinkedIn or GitHub.