It's Not the Model, It's the Context

I run into a lot of people who are convinced that the answer to their productivity plateau is just one model upgrade away. “Model X gives me better answers than Model Y, so if I just switch everything to X, I’ll code 50% faster.”

I get it. It feels natural. We’re used to hardware upgrades where a faster processor means faster rendering. But when it comes to LLMs, I think this focus is misplaced. We are in the middle of The Forever Race, where frontier models are leapfrogging each other every few weeks. Chasing the absolute “best” model is a recipe for constant migration headaches.

If you’re seeing inconsistent results, it’s probably not because Gemini 3 is suddenly “dumber” than Claude 4.5 Sonnet. It’s much more likely that you’re feeding it a conflicting, chaotic mess of instructions.

The “Stack” You Don’t See

When you type “fix this bug” into your AI coding agent, you aren’t just sending those three words. You’re sending a massive, concatenated string of text that looks something like this:

System Prompt: The core personality and rules (often hidden by the tool).
Context Files: Your GEMINI.md, CLAUDE.md, or project documentation.
Extension/Plugin Prompts: Instructions for every tool or skill you have enabled.
MCP Tool Definitions: JSON schemas for every database, API, and file tool you’ve connected.
RAG Data: Snippets of code or docs pulled in from your vector database.
User Input: “fix this bug.”

This is the Context Stack. And for many developers, it’s chockful of contradictions.

The Coherence Problem

Research from MIT has highlighted the “Lost in the Middle” phenomenon (often called U-shaped position bias). LLMs are great at remembering the beginning of a prompt (your system prompt) and the end (your user input), but they get fuzzy in the middle.

Guess what lives in the middle? That giant pile of context files, tool definitions, and extension instructions.

When you load up your session with 50 MCP tools and three different “rule” files that slightly contradict each other, you aren’t giving the model “more power.” You’re giving it cognitive dissonance.

Anthropic has noted that tool definitions alone can consume tens of thousands of tokens, often leading to “context rot” where the model simply stops following instructions because it’s overwhelmed by the schema definitions before it even gets to your code.

Optimizing the Stack

There is engineering work happening to fix this. Companies like Stacklok are building lazy-loading proxies for MCP, which only feed the model the tools it needs right now, rather than dumping the entire toolbox on the floor at the start of the session. This is huge—it can reduce token usage by up to 95%.

But as an end-user, you can’t just wait for better proxies. You have to clean up your room.

My Experiment: Radical Coherence

I’ve been experimenting with a “Radical Coherence” approach, and it has done more for my velocity than any model switch.

1. Custom System Prompts

I stopped relying on the default “You are a helpful assistant” prompts. I developed a Gemini CLI extension that lets me define the exact persona I need for a session.

If I’m debugging, I load a Debugger persona that prioritizes skepticism and log analysis. If I’m writing docs, I load a Technical Writer persona. This puts my most critical “how-to” instructions at the very top of the context stack—the strongest part of the U-curve.

Here’s a sample from one of my demo samples that makes Gemini CLI talk like a pirate:

You are a helpful CLI assistant who speaks exclusively in pirate dialect. All responses must be in pirate speak while remaining technically accurate and helpful.

# Core Behavior

- Answer all questions accurately and helpfully
- Translate ALL output into pirate dialect
- Maintain technical precision despite the theatrical language
- Be enthusiastic and embrace the pirate persona
- Use nautical and pirate terminology whenever possible

# Pirate Vocabulary Guide

**Common Substitutions:**
- Hello/Hi → Ahoy, Avast
- Yes → Aye
- No → Nay
- You → Ye, Yer
- Your → Yer
- My → Me
- Friend → Matey, Mate
- Look → Behold, Cast yer eyes upon
- Find → Discover, Unearth
- Stop → Avast, Belay
- Understand → Savvy
- Think → Reckon
- Want → Be wantin'
- Do/Does → Be doin'

**Pirate Expressions:**
- "Shiver me timbers!" (surprise)
- "Blow me down!" (amazement)
- "Arr!" or "Yarr!" (general exclamation)
- "By Davy Jones' locker!" (oath)
- "Walk the plank!" (threat/warning)
- "Thar she blows!" (discovery)
- "Dead men tell no tales" (secrecy)
- "Fair winds and following seas" (good wishes)

**Technical Terms:**
- Code → The ancient scrolls, the arcane runes
- Files → Treasure maps, documents, scrolls
- Directory → The hold, cargo bay
- Error → A kraken in the works, barnacles on the hull
- Install → Hoist aboard, bring into the hold
- Delete → Cast overboard, send to Davy Jones
- Run/Execute → Set sail, hoist the colors

# Examples

**Standard:** "I'll help you fix that bug in your code."
**Pirate:** "Arr, I'll help ye squash that kraken lurkin' in yer code, matey!"

**Standard:** "The file is located in the config directory."
**Pirate:** "Yer treasure map be restin' in the config hold, savvy?"

**Standard:** "You need to install these dependencies first."
**Pirate:** "Ye'll be needin' to hoist these provisions aboard before settin' sail, arr!"

# Technical Accuracy

- Never sacrifice correctness for theatrical flair
- Keep file paths, commands, and code syntax exactly as required
- Use pirate speak in explanations, not in literal code or commands
- Maintain security and safety best practices

Arr! Ready to set sail on this coding voyage, matey!

A total of 66 lines. The default Gemini CLI System Prompt, it starts in that file around line 138, ends up being about 107 lines long when it’s rendered (it can vary a little depending on your configuration).

Here is its first sentence:

You are an interactive CLI agent specializing in software engineering tasks. Your primary goal is to help users safely and efficiently, adhering strictly to the following instructions and utilizing your available tools.

What if I’m using it for something that’s not directly software engineering? It’s going to steer all of my questions and responses into that lane. What if I don’t want it to? In the end, Gemini CLI is just another interaction pattern that can be used for coding or marketing or anything else. That prompt is going to do me no favors if I’m using it for marketing or poetry or just general research.

I’m still experimenting, but right now I have a single agent for each distinct project. For each project, I create an alias in my .zshrc or .bashrc (or where Windows keeps aliases):

alias gemini-debugger='cd /path/to/project && GEMINI_SYSTEM_MD=~/.gemini/extensions/system-agents/system-agents/custom/debugger.md gemini'

Boom, a single command and I’m in a debugger session. No need to remember to load the right system prompt, no need to remember to load the right context file, no need to remember to load the right tools. The next step in the context chain is the default context file.

2. Tightly Coupled Context

I now ensure my GEMINI.md (my project context file) is explicitly written to support the System Prompt. If my System Prompt says “Prioritize security,” my GEMINI.md lists the specific auth patterns we use. No contradictions. No “figure it out.” This also tends to cut down on tool calls that troll the filesystem looking for files.

Since I have my system prompt in a file, I can use it and work with the agent to hammer out the most effective strategy. This is a process I re-visit pretty often. I love a good improvement loop. I also am working on an extension that helps with analyzing session history to improve the context. This takes my actual experiences and uses them to improve the context and system prompt.

3. Curated Extensions

I don’t have “all tools” enabled. I use Gemini Extensions to bundle specific prompts and tools together. When I’m working on marketing, I load the blog-publisher extension. I use the same Extension architecture to bundle improvement tools like the history learning extension. I get to control the GEMINI.md file for the extension, so I can make sure it’s aligned with the System Prompt and the default context file.

The Results

My “mop-up time”—the time I spend fixing the AI’s mistakes—has plummeted.

I’ve come to view my coding sessions with an LLM as true pair-programming. By controlling the system prompt and context, I’ve cultivated a pseudo-personality for the agent that aligns with my preferred working style. This means fewer “You’re absolutely right!” platitudes when I’m clearly mistaken, and more direct, constructive pushback or guidance.

This shift has significantly increased my comfort and reduced the friction in my workflow. I’m spending less time “fighting with the agent and the keyboard” and more time getting into flow. The time to solution for complex problems feels consistently shorter, and I’m tackling larger chunks of work faster than before.

While the raw power of models like Gemini 3 certainly contributes, I’m increasingly convinced that the primary driver of this increased velocity is the coherence of my context stack. When the System Prompt, the Context File, and the Tool Definitions all sing the same song, the model doesn’t hallucinate nearly as often. It doesn’t try to use a Python library that isn’t installed. It doesn’t write tests in the wrong framework.

Audit Your Stack

The next time you feel like your AI is “getting dumb,” don’t check the leaderboard. Check your context.

Take ten minutes today to read your own GEMINI.md or system prompt. Are you asking the model to be a “creative writer” and a “strict code linter” at the same time? Are you loading tools you haven’t used in a month?

Clean it up. Your “dumb” model might just be confused.