In my day job at Google Cloud, we were talking today about how Mythos and Google’s CodeMender map into an SDLC. The deep AI security agents run in batches. Nightly, weekly, asynchronously. They’re the backstops. So something should live in front of them, inside the flow, catching the cheap stuff before it ever reaches the slow scan. But what’s out there, and how do we help customers understand it?
The vendors are building the front. Anthropic shipped a security plugin for Claude Code in May. Google folded CodeMender into its agent platform at Google I/O. This isn’t tomorrow’s challenge. It’s shipping now.
So here’s my take. What the vendors are now putting in front of the backstop, what you still have to build yourself, and why the whole point is to make the deep scan boring.
The backstop is real, and mostly locked away
Start with the heavy machinery. Google DeepMind’s CodeMender launched in October 2025. It uses Gemini to find, fix, and validate vulnerabilities, and in its first six months it upstreamed 72 security fixes to open-source projects, some up to 4.5 million lines. Every patch gets human sign-off before it ships.
Anthropic’s Claude Mythos Preview is further out on the curve. It found thousands of previously unknown zero-days across every major operating system and browser. Anthropic decided it was too dangerous to release, so they put it behind Project Glasswing: scoped access to ~60 mostly giant corporations, backed by $100M in credits, instead of a public launch.
Notice the shape. Batch cadence. Governed. Human-in-the-loop. The most capable agent isn’t something you can buy. These are backstops by design.
The dread that attaches to a deep scan comes from treating it as a verdict handed down late. The more useful model is a backstop: the last line of defense, not the first. Its real output should be a small set of high-confidence fixes, not a flood of tickets or (even worse) surprises. The way to make a backstop welcome is to keep what reaches it small. The game is signal-to-noise and time-to-fix, not detection.
The vendors are building the front now
Here’s what changed. On May 27, 2026, Anthropic shipped a free security-guidance plugin for Claude Code. It’s free on every plan. And it runs as three checkpoints, in the flow:
- File edit: a fast, deterministic pattern match with no model call, catching constructs like
eval(),os.system(), anddangerouslySetInnerHTML. - End of turn: an Opus diff review that catches logic-level bugs string matching can’t see. Authorization bypass, insecure direct object references, server-side request forgery, weak crypto.
- Commit time: a deeper agentic review when the agent commits or pushes.
Read that list again. Shallow and cheap early, deep and expensive late, wired straight into the editor. That’s the exact thing I was going to argue teams should build. Anthropic built it and gave it away.
Google did the platform version. At I/O 2026 it folded CodeMender into the Gemini Enterprise Agent Platform, governed so patches apply “with your approval.” Worth noting: an analyst in that same piece points out there’s still no published data on false-positive or regression rates. Governed doesn’t mean measured yet. But early results are extremely positive.
So the reframe is this. The in-flow layer is arriving as product. But it’s partial. The Claude Code plugin covers the authoring gate inside one assistant. Your SDLC has more gates than that, and most of them are still yours to build.
The flow already has gates. Use all of them.
Every code change passes through natural checkpoints between idea and production. Editor. Commit. Pull request. CI. Deploy. Run fast, shallow checks early. Reserve slow, deep analysis for late. A finding caught when it costs a keystroke beats the same finding caught when it costs an incident.
A secret caught by an editor hook never enters the commit. A vulnerable dependency caught in CI never ships. The vendor plugin now owns a piece of the first gate. You still own the rest.
Why AI-generated code raises the stakes
Veracode’s 2025 GenAI Code Security Report tested more than 100 models and found that 45% of AI-generated samples introduced an OWASP Top 10 weakness. The same report clocks AI code at 2.74x more vulnerabilities than human-written equivalents, with Java failing over 70% of the time. The Spring 2026 follow-up found the models still haven’t fixed it.
The implication is architectural. The system that generates code and the system that checks it should not be the same system. An independent checker that scans each AI edit the moment it lands catches the generator’s blind spots before a human reviews the diff. That authoring-time gate is the part most teams skip. The good news is you can now get a real chunk of it for free.
I’ve made the safety-versus-speed version of this argument before, in When Your AI Agent Decides to Skip the Tests. The security version is the same shape. An independent loop checking the agent’s output is worth more than the agent’s own confidence.
Build the rest yourself
Even with the plugin running, you own the gates it doesn’t cover and the tools it doesn’t run. The authoring gate is the most DIY-able piece. All major coding harnesses have a PostToolUse hook that fires the moment the agent writes a file, runs a scanner on just that file, and feeds a non-zero exit back into the agent’s context so it fixes its own output:
{
"hooks": {
"PostToolUse": [{
"matcher": "Edit|Write",
"hooks": [{
"type": "command",
"command": "jq -r '.tool_input.file_path' | xargs semgrep --config auto --error"
}]
}]
}
}
Below that, the plumbing is boring and reliable. The pre-commit framework runs gitleaks and a thin Semgrep ruleset at commit (keep it sub-second or it gets disabled). GitHub Actions runs the heavier pass on push with reviewdog posting inline PR comments. CI runs OSV-Scanner, Trivy, Syft for an SBOM, and cosign for signing.
What you own is the unglamorous part. Triage, dedup, false-positive suppression, keeping rulesets current. The tools are free. The maintenance is the price. I argued the same shift-left logic in Shift Left CI/CD, where most of my validation already lives in local hooks instead of a central pipeline.
What a platform actually buys you
Open source detects fine. A vendor platform isn’t selling you detection. It’s selling three things open source is weak at.
- Signal. Reachability analysis suppresses findings in code that can’t actually execute. Dedup keeps one bug from becoming five tickets.
- Remediation. Copilot Autofix, Snyk, and Semgrep Autofix generate the patch, not just the alert.
- Supply-chain defense. Socket flags malicious package behavior, the poisoning class that has no CVE yet.
The cost is per-seat money and lock-in. This is the same managed-versus-DIY tension I hit moving LLM infrastructure around in The Ergonomics of Cheap Tokens. You either pay a premium for someone to handle the complexity, or you save the cash and pay with your time.
The honest trade-off, and the hybrid I think will win
Nobody runs purely one or the other. The cheap, fast, deterministic gates are better DIY. Free, yours, they never phone home. The expensive-to-build capabilities (reachability triage, trustworthy autofix, behavioral supply-chain) are where a vendor earns its seat price. So build everything left of the PR, and buy selectively at the PR-and-CI layer.
| Gate | DIY default | Worth buying |
|---|---|---|
| Author | Claude Code security plugin + Semgrep, gitleaks | In-IDE platform, if budget allows |
| Commit | pre-commit: gitleaks, fast Semgrep | n/a |
| Pull request | GitHub Actions + reviewdog | Autofix: Copilot, Snyk, Semgrep |
| CI / deps | OSV-Scanner, Trivy, Syft SBOM, cosign | Reachability + Socket supply chain |
| Backstop | Scheduled CodeQL | Mythos / CodeMender |
Processes that outrank tools
- Paved roads. Bake the secure option into scaffolding so the easy path is the safe path. Every guardrail in the template is a finding no scanner has to generate.
- Lightweight threat modeling at design time. Scanners are worst at IDOR, broken authorization, and tenant-isolation leaks. Those are design flaws, not patterns.
- Trustworthy signal. A wall of unprioritized severity scores trains developers to ignore the scanner. A short list of reachable issues earns attention.
- Fixes as the default output. A finding that ships with its patch gets merged. One that ships as a ticket rots.
- Eliminate classes, don’t patch instances. Types that make illegal states unrepresentable remove whole categories of bug, at design time, for free.
The point
By the time CodeMender or Mythos or whatver agent runs, the cheap issues were killed in the editor, the noisy ones filtered by reachability, and what remains is small, real, and arrives with a fix attached. The deep scan stops being a verdict and becomes a quiet confirmation.
The interesting part is that vendors now/will soon ship the first layer of that for you. The job didn’t disappear. It moved. Your work isn’t running the scanner anymore. It’s composing the gates: the free plugin in the editor, your own hooks around it, the boring plumbing at commit and CI, and the governed backstop at the end. That’s what “developers shouldn’t be scared of the results” actually means. Not that the scan is gentle. That nothing reaches it by surprise.
This post picks up where Google’s CodeMender: When Security Tools Become Autonomous Agents left off. That one was about the backstop. This one is about everything in front of it.
Sources and notes
- Google DeepMind, Introducing CodeMender (October 2025), InfoQ coverage of the 72-fix / 4.5M-line figures, and InfoWorld on the Google I/O 2026 Gemini Enterprise Agent Platform integration and the absence of published false-positive data.
- Anthropic, Claude Mythos Preview cybersecurity assessment and Project Glasswing for the backstop framing, plus reporting on the free Claude Code security plugin (May 2026) and its three in-flow checkpoints.
- Veracode, 2025 GenAI Code Security Report (~45% of samples carry an OWASP Top 10 weakness) and the Spring 2026 update.
- Open-source components: Semgrep, gitleaks, OSV-Scanner, Trivy, Syft, cosign, reviewdog, CodeQL, and the pre-commit framework. Commercial: Copilot Autofix, Snyk, Socket.