MCP Context Bloat: When Your AI Agent Gets Dependency Hell

I’ve been working with AI coding agents for the past few months, and we’ve hit an interesting problem. It’s not a bug. It’s not a missing feature. It’s something much more familiar: dependency hell.

Except this time, it’s not npm packages or Python libraries. It’s Model Context Protocol (MCP) servers.

The Problem Anthropic Just Named

Anthropic published a blog post that finally named what we’d been experiencing: MCP context bloat. When you connect an AI agent to multiple MCP servers, every single tool definition loads into the context window upfront.

The numbers are brutal. Connect to a few popular MCP servers, and you can easily consume 50,000+ tokens before the agent even looks at your code. As Simon Willison noted, the GitHub MCP alone defines 93 tools and eats 55,000 tokens.

Run 20 MCP servers like some setups recommend? You’re out of context after five short prompts.

This felt familiar. Too familiar.

This Is Just Dependency Management

If you’ve ever run npm install and watched your node_modules folder balloon to 300MB for a “hello world” app, you know this pattern. We learned these lessons with package managers:

More dependencies = more surface area for problems
Transitive dependencies compound the issue
You pay the cost whether you use them or not

The software supply chain security world has been screaming about this for years. According to recent studies, 84% of codebases include at least one known open-source vulnerability. Third-party code is now responsible for 30% of breaches.

Craig McLuckie, who I worked with at VMware on Tanzu, saw this coming. He co-founded Stacklok to tackle software supply chain security. Their tools help developers understand and control what they’re actually depending on. They have an interesting product acts as a proxy between your Agent and the Frontier LLM that powers it. It edits the MCP calls on the fly, only sending what you need to your LLM. Very interesting, with some potential sharp edges.

MCP needs the same thinking.

What We’re Doing About It

We run a marketing AI agent that manages three blogs, integrates with Google Analytics, generates images, schedules tasks, and more. It would be easy to connect 20 MCP servers. We’re running six.

Here’s how we decide:

1. Start with Zero, Add with Purpose

Every MCP server has to justify its token cost. We don’t install something because it might be useful. We add it when we hit a specific, recurring need.

Example: We added the Google Analytics MCP after manually checking stats became a daily task. The scheduler MCP came after we needed reliable automation. Both earned their context space.

2. Prefer CLI Over MCP When Possible

If the agent can already run terminal commands and there’s a good CLI tool (like GitHub’s gh), we use that instead. As Anthropic’s research shows, CLI tools cost almost zero tokens compared to MCP definitions.

Example: We use gh commands for GitHub operations instead of the GitHub MCP. Same functionality, 98.7% fewer tokens.

3. Consolidate Where Reasonable

Some MCP servers try to do too much. Others do one thing well. We prefer focused servers and accept that we might need to write some glue code.

Example: Our Gemini MCP handles both chat and image generation. That’s two distinct capabilities, but they share authentication and are frequently used together. We keep it.

4. Audit Regularly

Every few weeks, we check which MCPs are actually being used. If something hasn’t been called in a month, it probably doesn’t need to be loaded all the time.

This mirrors what dependency scanning tools do for traditional supply chains. You can’t manage what you don’t measure.

The Broader Pattern

The software supply chain security community has frameworks for this. SLSA (Supply-chain Levels for Software Artifacts) provides a maturity model. SBOMs (Software Bill of Materials) give you visibility. Tools like Stacklok’s Minder apply controls across the entire lifecycle.

MCP doesn’t have those frameworks yet, but the problems are identical:

Visibility: What am I actually depending on?
Control: How do I limit what gets loaded?
Verification: Is this dependency doing what I expect?
Maintenance: How do I keep this manageable over time?

The EU Cyber Resilience Act is forcing supply chain discipline with penalties up to €15 million for non-compliance. AI agents will need similar discipline, even without regulation.

What’s Next

Anthropic’s solution is elegant: present MCPs as code APIs that the agent can selectively load and use. This drops token usage from 150,000 to 2,000 in their example. That’s the right direction.

But the cultural shift matters more than the technical solution. We need to treat MCP servers like dependencies:

Document what you’re using and why
Audit regularly
Prefer focused, single-purpose servers
Accept that “install everything” is not a strategy

I’d love to see someone build the equivalent of npm audit or cargo tree for MCP configurations. Show me which servers are loaded, how many tools they expose, and what my actual token footprint is before I run a single prompt.

Until then, we’re sticking with our rule: every MCP server has to justify its existence. Not just once during installation, but continuously.

Because context windows, like dependency trees, are finite resources. And we learned the hard way what happens when you treat finite resources like they’re infinite.

What’s your approach to managing MCP servers? Are you running into context bloat? I’m curious what patterns are emerging in the community.