I just discovered that we’ve been accidentally doing spec-driven development.

I have 43 specialized AI agents across three repositories. Each agent has a detailed prompt defining its role, tools, and constraints. CLAUDE.md files capture project context and operating principles. Workflow documentation describes multi-step processes like content publishing and database migrations.

Then I read about spec-driven development and realized: these are specifications. We’ve been writing structured specs for AI agents to interpret and execute, without knowing that’s what we were doing.

Here’s how close we are to the formal spec-driven development methodology, what we’re doing right, and what we need to improve.

What is Spec-Driven Development?

Spec-driven development (SDD) is about writing structured documents that capture what you want the system to do, then letting AI agents interpret and implement those requirements.

The key distinction: specs are not prompts. A prompt is a one-off instruction in an interactive session. A spec is a versioned, structured document that defines requirements, constraints, and behavior.

Traditional development: You write detailed implementation instructions. “Create a function that takes two parameters, validates them, calls this API, handles these errors, and returns this format.”

Prompt engineering: You iteratively refine instructions through conversation. “Write a function for user authentication. No, use JWT tokens. Actually, add refresh tokens too. Can you handle rate limiting?”

Spec-driven development: You write requirements. “The system must authenticate users via JWT tokens with refresh capabilities and rate limiting.” The AI agent figures out how to implement it. Learn more from GitHub’s spec-driven development guide.

This is a fundamental shift from micromanagement to delegation. Instead of instructing agents on implementation details, you articulate intent and let agents handle execution autonomy.

How Spec-Driven Development Compares to TDD and BDD

Spec-driven development isn’t the first methodology to use “specifications” or emphasize requirements. Here’s how it compares to other established approaches:

AspectTDD (Test-Driven Development)BDD (Behavior-Driven Development)SDD (Spec-Driven Development)
Primary FocusCode correctness through testsBusiness behavior and requirementsAI agent coordination through specs
Who WritesDevelopersDevelopers + Product/BusinessDevelopers working with AI
FormatTest code (xUnit, Jest, etc.)Gherkin scenarios (Given/When/Then)Structured docs (Markdown, YAML)
ExecutionTest runner executes testsTest runner interprets scenariosAI agents interpret specs
Feedback LoopRed-Green-Refactor cycleScenario validationSpec refinement through agent execution
Primary ArtifactTest suiteFeature files + testsSpecification documents
Best ForEnsuring code correctnessAligning features with business needsCoordinating multiple AI agents
VerificationTests pass/failScenarios pass/failAgent behavior matches specs
When to UseAlways (for testable code)When non-technical stakeholders are involvedWhen using AI agents for development

Key Insight: TDD and BDD focus on verifying what humans build. Spec-driven development focuses on defining what AI agents should build. You can—and should—use them together. Your specs define requirements for AI agents, which then implement code using TDD, validated by BDD scenarios.

The Evolution from Vibe Coding to Structured Specifications

Before I had 43 agents, I was doing what the community calls “vibe coding.” This is the opposite of the structured patterns that experienced architects use. You know vibe coding if you’ve done it:

  • Ask Claude to “make the dashboard look better”
  • Get frustrated when it makes changes you didn’t want
  • Spend 20 minutes explaining what “better” means
  • Repeat until it matches the vibe in your head

Vibe coding works for quick experiments. It falls apart for anything complex or when multiple agents need to coordinate.

The problem isn’t the AI. The problem is unclear requirements. If I don’t know exactly what I want, the AI can’t know either. Iterative prompting is just debugging unclear specifications in real-time.

Spec-driven development fixes this by forcing clarity upfront. When you write a spec, you have to define:

  • What the system should do (requirements)
  • What constraints apply (performance, security, compatibility)
  • What success looks like (acceptance criteria)
  • How components interact (integration points)

Writing specs is harder than vibe coding. But running structured workflows with clear specifications is vastly more reliable than hoping the AI understands your vibe.

The upfront investment in clarity pays dividends in execution reliability.

10 Key Characteristics of Effective Specifications

The article I read outlines 10 essential things about specs. Let me evaluate our system against each:

1. Specs Formulate Intent as Requirements

What it means: Specs describe what, not how. Requirements, not implementation.

Our system: CLAUDE.md files do this well. The marketing repo’s CLAUDE.md specifies goals (“10 alpha testers by Dec 31, 2025”), constraints (“Budget: $500 Q4”), and operating principles (“Customer discovery first”). It doesn’t prescribe how to achieve these.

Gap: Individual agent prompts sometimes mix requirements with implementation details. The content-publishing-specialist prompt includes both “what” (validate frontmatter) and “how” (specific validation steps).

Improvement: Separate requirements from procedures in spec-driven development. Agent prompts should define the agent’s purpose and success criteria. Workflow procedures should live in separate task specs.

2. Multiple Flavors of Specs Exist

What it means: Four types—functional/technical features, agent identities, task/workflow specs, and usage/knowledge specs.

Our system: We have three of four:

  • Agent identity specs: Each .claude/agents/*.md file defines an agent’s role, mission, and capabilities
  • Task/workflow specs: Content publishing workflow, migration procedures
  • Usage/knowledge specs: CLAUDE.md serves as project-level knowledge

Gap: No formal feature specs. When I need a new feature, I describe it ad-hoc rather than writing a structured specification document.

Improvement: Create specs/features/ directory with structured requirements for each major feature. Include acceptance criteria, dependencies, and constraints.

3. Format Flexibility (Markdown Dominates)

What it means: While format matters less than agent responsiveness, Markdown is preferred for readability.

Our system: Everything is Markdown. Agent prompts, CLAUDE.md files, workflow documentation. Even our blog frontmatter uses YAML within Markdown.

What we’re doing right: Markdown is git-friendly, human-readable, and easy for AI to parse. No changes needed here.

4. Context Window Constraints

What it means: Large specs overload context. Solutions include RAG, specialized subagents, and intelligent selection.

Our system: This is our biggest weakness. CLAUDE.md files are growing large (200+ lines). When all 16 agent files load simultaneously, we’re burning context window on specs the current agent doesn’t need.

Gap: No intelligent spec selection. Claude Code loads all agents regardless of relevance. No RAG system for retrieving only relevant specs.

Improvement:

  • Break large CLAUDE.md into focused sections (.claude/specs/goals.md, .claude/specs/workflows.md, .claude/specs/constraints.md)
  • Implement routing logic so agents only load relevant specs
  • Use subagents with focused context windows for specialized tasks

5. Regeneration Realities

What it means: Full autonomous spec regeneration is hard. Requires planning methods like backlog.md and iterative refinement.

Our system: We don’t do spec regeneration at all. When requirements change, I manually update CLAUDE.md and agent prompts. This is one area where our spec-driven development implementation needs improvement.

Gap: No systematic approach to evolving specs based on learnings. No backlog.md pattern for tracking required changes.

Improvement: Adopt the backlog pattern:

# backlog.md

## Agent Improvements Needed
- [ ] content-publishing-specialist: Add LinkedIn cross-post support
- [ ] seo-specialist: Integrate GA4 data for keyword performance

## CLAUDE.md Updates
- [ ] Add hardware fulfillment workflow (when >50 customers)
- [ ] Update pricing strategy based on alpha tester feedback

6. Tool Ecosystem

What it means: 450+ tools tracked including GitHub Spec Kit, AWS Kiro, Claude Flow. Consistency matters more than features.

Our system: We use Claude Code exclusively with custom agent configurations. No external spec tooling beyond what Claude Code provides.

What we’re doing right: Consistency. One tool, one approach, deeply integrated with our workflow.

Opportunity: Explore GitHub Copilot Workspace or AWS Kiro when we need collaborative spec management across team members.

7. Spec Sharing

What it means: Registries similar to npm or Helm Charts enable organizational reuse.

Our system: Agents are repo-specific. The seo-specialist in brandcast-marketing is different from hypothetical SEO agents in other projects.

Gap: No reusable spec library for spec-driven development. Can’t easily share agent configurations across projects.

Improvement: Extract general-purpose agents into a shared .claude/library/ that multiple projects can reference. Treat specs as reusable components.

8. Code Review Still Essential

What it means: Human oversight remains critical. Separate implementation and testing agents improves quality.

Our system: This is where we excel. Every git commit is human-reviewed. Agents don’t push to production. Critical operations (migrations, publishing) require human approval.

What we’re doing right: Clear boundaries on agent autonomy. Documentation agents run for 15-20 minutes unsupervised, but their output is reviewed before use. Code changes go through pull requests. Financial projections require validation before decisions.

Improvement: Add test-writing agents separate from implementation agents, as recommended. Currently the same agent writes code and tests, which can create blind spots.

9. Legacy Applications

What it means: Strangler pattern enables gradual modernization.

Our system: Not applicable yet. BrandCast is greenfield, built with AI assistance from day one.

Future consideration: When we eventually need to modernize our own codebase, we’ll use specs to define boundaries between legacy and new systems.

10. Early but Valuable

What it means: SDD isn’t a silver bullet, but offers real structure compared to vibe coding.

Our assessment: This perfectly describes our experience. Structured specs (CLAUDE.md, agent prompts, workflows) make multi-agent coordination possible. Vibe coding couldn’t handle 43 specialized agents across three domains.

The value: Predictability. When I invoke the seo-specialist, I know what it will do because its spec defines its behavior.

When I run the content publishing workflow, I know the steps because they’re documented. This reliability would be impossible with ad-hoc prompting.

CLAUDE.md Files: Accidental Specification Architecture

Let’s look at what CLAUDE.md actually is in spec-driven development terms.

Our marketing repo’s CLAUDE.md contains:

  • Project context: Pre-beta phase, Q4 2025 goals, budget constraints
  • Product definitions: BrandCast and FamilyCast value propositions, pricing
  • Operating principles: Customer discovery first, bootstrap mentality, feature discipline
  • Workflows: Content publishing, weekly planning, cross-team communication
  • Success metrics: 10 alpha testers, <30% churn, pricing validation

This is a project-level specification. It captures the “what” (goals and constraints) without prescribing “how” (implementation details).

When the weekly-planning-specialist agent runs, it reads CLAUDE.md to understand:

  • Current phase (pre-beta) informs priorities
  • Goal (10 alpha testers) determines time allocation
  • Constraints (no revenue expected) shapes recommendations
  • Principles (customer discovery first) guides decision-making

CLAUDE.md acts as a requirements document that multiple agents consult to align their behavior with project goals.

What we’re doing right:

  • Single source of truth for project context
  • Version controlled (git tracks changes)
  • Updated as requirements evolve
  • Referenced by multiple agents

What needs improvement:

  • Too large (200+ lines)
  • Mixes multiple concern levels (goals, workflows, guidelines)
  • No clear sections for agents to selectively load
  • Some information duplicated in agent prompts

Agent Prompts as Identity Specifications

Each agent in .claude/agents/*.md is an identity specification. It defines:

Who the agent is:

You are an SEO Specialist for BrandCast. You create AND audit
SEO-optimized content for B2B and B2C audiences.

What the agent does:

Your mission is twofold:
1. CREATE content that ranks on page 1 AND converts readers
2. AUDIT existing content to identify SEO improvements

What tools the agent uses:

Tools available:
- WebSearch for keyword research
- Grep for finding existing content
- Edit for updating posts

What constraints apply:

NEVER fabricate statistics. ALWAYS link to sources.
All claims must be verifiable or clearly labeled as opinion.

This is a formal specification. It’s structured, versioned, and executable by Claude Code’s agent orchestration system.

Compare this to vibe coding:

  • Vibe: “Help me with SEO stuff”
  • Spec: “You are an SEO Specialist. Your mission is [detailed]. Use these tools [specific]. Follow these constraints [explicit].”

The spec approach produces consistent, predictable behavior because the agent’s role is formally defined.

This formality is what enables reliable multi-agent coordination.

What we’re doing right:

  • Clear role definitions
  • Explicit tool access
  • Constraint specifications
  • Example-driven documentation

Gaps:

  • No versioning strategy (how do we deprecate old agent behaviors?)
  • Limited error handling specs (what does agent do when tools fail?)
  • No performance criteria (how fast should agents complete tasks?)

Workflow Documentation: Task Specs in Practice

Our workflow documentation is task specifications. Look at the content publishing workflow:

### Content Publishing Workflow
1. Draft: Write in content/{brand}/drafts/
2. Images: Add hero image (see Image Guidelines)
3. Links: Reference help docs
4. Validate: Use content-publishing-specialist
5. Publish: Use slash commands

This is a sequential workflow spec. It defines the steps, their order, and the agents involved. Sequential workflows are one of the core agentic AI patterns.

The content-publishing-specialist agent interprets this spec:

  1. Locate draft in expected location
  2. Validate frontmatter schema
  3. Check hero image exists
  4. Verify internal links
  5. Execute publishing slash command

This is structured delegation. The workflow spec says “what” needs to happen. The agent spec says “how” to execute each step.

What we’re doing right:

  • Clear step sequences
  • Explicit hand-offs between stages
  • Error conditions documented
  • Expected outcomes defined

Opportunities:

  • Formalize as machine-readable specs (YAML workflow definitions)
  • Add rollback procedures for each step
  • Define timeout and retry logic
  • Specify parallel vs sequential execution

What Our Agent System Gets Right About Spec-Driven Development

Looking at our 43-agent system through the spec-driven development lens, here’s what’s working:

1. Structured Requirements Over Ad-hoc Prompts

Every agent operation starts with a spec. CLAUDE.md defines project requirements. Agent files define identity specs. Workflows define task specs.

This is the core of spec-driven development: formalized intent that agents execute consistently.

2. Version Control for Specifications

All specs are in git. When we change an agent’s behavior, we edit its spec file and commit. This creates an audit trail of requirement changes and their rationale.

3. Separation of Concerns

Marketing agents don’t know about database schemas. Engineering agents don’t make pricing decisions. Business agents don’t write blog posts.

Each agent’s spec defines its domain. This separation enables specialization without overlap or conflict. Not all 1,189 AI agents matter—but specialized ones do.

4. Human-in-the-Loop for Critical Decisions

Specs define where humans must approve:

  • Blog publishing (manual slash command execution)
  • Database migrations (review generated SQL before applying)
  • Financial projections (validate assumptions before decisions)

This balances automation with oversight, exactly as spec-driven development recommends.

5. Testing as Intent Expression

Our content integrity rules are specifications: “All quotes must be linked to sources. All statistics must be cited.” The publishing agent validates against these specs before deployment.

Tests and specs work together to define correct behavior.

Critical Gaps: Context Windows, Regeneration, and Backlog Management

Despite getting a lot right, we have significant gaps compared to mature spec-driven development:

Gap 1: Context Window Inefficiency

When Claude Code loads an agent, it loads the entire CLAUDE.md file plus all agent specs. For a simple task like “validate this blog post,” the agent gets:

  • 200 lines of project context
  • 16 agent definitions (including agents it won’t use)
  • Multiple workflow documents

This burns context window on irrelevant information.

Needed: Intelligent spec selection. The content-publishing-specialist should load only:

  • Content publishing workflow spec
  • Blog configuration spec
  • Content integrity rules
  • Publishing agent identity

Everything else is noise.

Gap 2: No Specification Regeneration

When requirements change, I manually update specs. There’s no systematic process for:

  • Identifying which specs need updating
  • Regenerating specs based on learnings
  • Validating that all affected specs are consistent

Needed: The backlog.md pattern. A living document tracking:

  • Spec improvements required
  • Changes triggered by customer feedback
  • Inconsistencies across specs
  • Deprecations and migrations

Gap 3: Missing Feature Specifications

When I need a new feature, I describe it conversationally: “Add LinkedIn cross-posting to the publishing workflow.”

This works but isn’t spec-driven. A proper feature spec would define:

# Feature Spec: LinkedIn Cross-Posting

## Requirements
- Publish blog posts to LinkedIn after main blog deployment
- Support both personal and company LinkedIn pages
- Preserve formatting (code blocks, headers, links)
- Track publish status (success/failure)

## Constraints
- LinkedIn API rate limits (100 posts/day)
- Post length limit (3000 characters)
- Manual OAuth authentication (no stored tokens)

## Acceptance Criteria
- [ ] Draft published to blog
- [ ] Content formatted for LinkedIn character limits
- [ ] LinkedIn post created via API
- [ ] Publish status recorded
- [ ] Errors logged with rollback guidance

Needed: A specs/features/ directory with structured requirements for each feature under development.

Gap 4: No Test Separation

Currently, implementation agents sometimes write their own tests. This creates confirmation bias—the agent tests for what it thinks it built, not necessarily what the spec requires.

Needed: Separate test-writing agents that work from specs:

  1. Read feature spec
  2. Generate acceptance tests
  3. Implementation agent builds to pass tests
  4. Testing agent validates against spec

This separation ensures tests reflect requirements, not implementation assumptions.

Implementing the backlog.md Pattern

The most actionable improvement from spec-driven development is adopting the backlog pattern.

Create backlog.md in each repo:

# Specification Backlog

## High Priority

### Content Publishing Workflow
- [ ] Add LinkedIn API integration
- [ ] Implement automatic image resizing for social previews
- [ ] Add retry logic for failed Gemini image generation

### Agent Improvements
- [ ] seo-specialist: Integrate GA4 keyword performance data
- [ ] weekly-planning-specialist: Auto-detect automation failures from health checks

### CLAUDE.md Updates
- [ ] Split into focused sections (goals.md, workflows.md, constraints.md)
- [ ] Add hardware fulfillment workflow spec (when >50 customers)
- [ ] Document cross-repo agent coordination patterns

## Medium Priority

### New Feature Specs Needed
- [ ] Customer interview scheduling system
- [ ] Automated email drip campaigns
- [ ] Competitive pricing analysis dashboard

### Testing Infrastructure
- [ ] Create test-writing agents separate from implementation
- [ ] Add acceptance test generation from feature specs
- [ ] Implement pre-commit spec validation

## Low Priority

### Agent Library
- [ ] Extract reusable agents to shared library
- [ ] Create agent templates for common patterns
- [ ] Document agent composition patterns

This makes spec maintenance explicit and trackable. When an agent fails or requirements change, add items to backlog.md. Weekly planning agent can reference this to prioritize spec improvements.

Making Our Agent System More Spec-Driven: 5 Specific Improvements

Based on this analysis, here are the concrete changes I’m making:

1. Split CLAUDE.md Into Focused Sections

Current: One 200-line file with mixed concerns

Improved:

.claude/
  specs/
    goals.md          # Q4 objectives, success metrics
    products.md       # BrandCast/FamilyCast definitions
    principles.md     # Operating principles, constraints
    workflows/
      publishing.md
      migrations.md
      planning.md
    cross-team.md     # GitHub coordination protocols

Agents load only relevant sections. Publishing agent reads publishing.md, not the entire project context.

2. Create Feature Specification Template

Current: Ad-hoc feature requests in conversation

Improved: Template in specs/features/_template.md:

# Feature: [Name]

## Requirements
- [What the feature must do]

## Constraints
- [Performance, security, compatibility limits]

## Acceptance Criteria
- [ ] [Testable outcomes]

## Dependencies
- [Required components, data, or services]

## Risks
- [Potential failure modes and mitigations]

New features start with spec, not conversation.

3. Implement backlog.md in All Repos

Current: Spec improvements happen reactively when problems surface

Improved: Maintain backlog.md with three priorities:

  • High: Blocks current work
  • Medium: Improves efficiency
  • Low: Nice to have

Weekly planning agent reviews backlog and allocates time to spec maintenance.

4. Add Test-Writing Agent Separate from Implementation

Current: Code-writing agents generate their own tests

Improved: New agent in .claude/agents/test-writer.md:

You are a Test-Writing Specialist. You write acceptance tests
from feature specifications, separate from implementation.

Your mission:
1. Read feature spec from specs/features/
2. Generate acceptance tests covering all requirements
3. Ensure tests are independent of implementation choices
4. Validate tests against spec, not code

This ensures tests validate requirements, not just implementation.

5. Add Context Window Optimization to Agent Routing

Current: All specs load for every agent invocation

Improved: Agent routing logic in CLAUDE.md:

## Agent Context Loading Rules

When invoking agents, load ONLY relevant specs:

- content-publishing-specialist:
  - workflows/publishing.md
  - blog configuration
  - content integrity rules

- seo-specialist:
  - products.md (for brand voice)
  - SEO guidelines
  - Analytics access

- weekly-planning-specialist:
  - goals.md
  - principles.md
  - All workflow summaries (not full content)

This reduces context window usage by 60-70% for most agent invocations.

Lessons Learned: Specifications Before AI Agents

The meta-lesson from discovering we’re accidentally doing spec-driven development: the methodology emerged from necessity, not theory.

I didn’t set out to implement spec-driven development. I set out to build a system where 43 AI agents could coordinate reliably across three domains.

Structured specifications were the only way to make that work.

Vibe coding breaks down when:

  • Multiple agents need to coordinate
  • Requirements are complex and interdependent
  • Consistency matters more than speed
  • You need to track why decisions were made

Sometimes you need AI agents, sometimes you need shell scripts. The key is knowing which tool fits the problem.

Spec-driven development fixes these problems by forcing clarity upfront. The time invested writing structured specifications pays off in:

Predictability: Agents behave consistently because specs define behavior Debuggability: When something fails, compare execution to spec Evolvability: Change specs, agents adapt without rework Coordination: Multiple agents work from shared requirements

The article I read calls it spec-driven development. I was calling it “making AI agents actually useful.” Same thing.

If you’re building with AI agents and finding vibe coding unpredictable, you don’t need new tools. You need better specifications. Write down:

  • What you want the system to do (requirements)
  • What constraints apply (performance, security)
  • What success looks like (acceptance criteria)
  • How components interact (integration specs)

Then let agents execute from those specs. It’s more upfront work than ad-hoc prompting, but it’s the only approach that scales beyond toy projects.

Frequently Asked Questions

What is spec-driven development?

Spec-driven development (SDD) is a methodology where you write structured documents that capture what you want the system to do, then let AI agents interpret and implement those requirements. Unlike traditional prompt engineering where you iteratively refine instructions, specs are versioned, structured documents that define requirements, constraints, and behavior.

How is spec-driven development different from prompt engineering?

Prompt engineering involves iterative conversation to refine what you want: “Write a function for user authentication. No, use JWT tokens. Actually, add refresh tokens too.” Spec-driven development defines requirements upfront: “The system must authenticate users via JWT tokens with refresh capabilities.” The AI figures out implementation. SDD produces more consistent, predictable results because requirements are formally defined before execution.

What are the benefits of using specifications with AI agents?

Specifications provide four key benefits: Predictability (agents behave consistently because specs define behavior), Debuggability (when something fails, compare execution to spec), Evolvability (change specs and agents adapt without rework), and Coordination (multiple agents work from shared requirements without conflicts).

Do I need special tools for spec-driven development?

No. You can start with Markdown files in git. Our 43-agent system uses only Claude Code with custom agent configurations. What matters is structured, versioned specifications—not specialized tooling. Tools like GitHub Spec Kit or AWS Kiro can help at scale, but aren’t required to get value from spec-driven development.

When should I use spec-driven development instead of vibe coding?

Use spec-driven development when: multiple agents need to coordinate, requirements are complex and interdependent, consistency matters more than speed, or you need to track why decisions were made. Stick with vibe coding for quick experiments and simple one-off tasks where iteration is faster than upfront specification.


What’s Next

I’m implementing the five improvements over the next two weeks:

Week 1:

  • Split CLAUDE.md into focused sections
  • Create feature spec template
  • Add backlog.md to all three repos

Week 2:

  • Build test-writing agent
  • Implement context window optimization
  • Document agent routing logic

Then I’ll write a follow-up post: “Implementing Spec-Driven Development: What Changed After 30 Days”

The goal isn’t to follow a methodology perfectly. The goal is to make our 43-agent system more reliable, maintainable, and effective at solving real problems.

Turns out we were already doing most of that. Now we know what to call it and where to improve.


Building with AI agents? Read more about agent orchestration and agentic design patterns. Find me on LinkedIn or GitHub.