I have a RAG pipeline for my digital signage platform. It’s used by our support bot, Chip. The architecture diagram looked impressive. GitHub repositories, Cloud Storage buckets, Vertex AI integrations, seven database tables tracking file hashes and sync states. It worked beautifully. Chips gets any updated articles from our knowledgebase every night. For a 1-person startup, I’m quite impressed with what I did there.

Then Google released File Search for the Gemini API last week, and I realized I coud immediately retire a ton of that as technical debt.

The Problem: RAG Pipeline Complexity

Let me show you what I built. My documentation sync system needed to keep help docs from two GitHub repositories available for AI-powered support queries. The workflow looked like this:

GitHub Repo → Fetch Markdown → Upload to GCS → Import to Vertex RAG → Query RAG

Here’s what that actually meant:

  1. Cron job triggers daily at 2 AM UTC
  2. DocSyncService checks GitHub for changed files (incremental sync)
  3. Three-tier optimization checks: database hash → GCS metadata → full upload
  4. Changed files uploaded to Cloud Storage with SHA256 hash metadata
  5. Import files to Vertex AI RAG corpus (async polling, about 5 minutes)
  6. RagService queries the corpus at runtime

The implementation sprawled across 1,240 lines of TypeScript, seven database tables, and three separate services. Every new feature meant coordinating changes across multiple components.

The funny part? It was working perfectly. The scheduled job had a 100% success rate processing 55-57 documentation files. Response times were great. Cost was minimal.

But complexity has a cost you don’t see on your cloud bill.

File Search is a fully managed RAG system built directly into the Gemini API. You upload documents, and Google handles the chunking, embedding generation, vector indexing, and semantic search.

The key difference from the regular Files API is persistence. Standard file uploads disappear after 48 hours. Documents imported into a File Search store stay indefinitely until you delete them.

From the official announcement, Google now handles:

  • Automatic document chunking (configurable token limits)
  • Embedding generation and storage
  • Vector search infrastructure
  • Metadata filtering and organization
  • Citation tracking via grounding metadata

You get all the RAG functionality without building any of the infrastructure.

Refactoring the Architecture

Here’s the comparison that made me reconsider everything:

Before: Custom Vertex RAG Pipeline

// Simplified version of what I had
class DocSyncService {
  async syncRepository(repo: Repository) {
    // 1. Fetch files from GitHub
    const files = await this.github.getFiles(repo);

    // 2. Check database for existing hashes
    const existingHashes = await this.db.getFileHashes(files);

    // 3. Filter changed files (3-tier deduplication)
    const changedFiles = await this.filterChanged(files, existingHashes);

    // 4. Upload to GCS with metadata
    await Promise.all(
      changedFiles.map(f => this.gcs.upload(f, {
        metadata: { sha256: this.hash(f.content) }
      }))
    );

    // 5. Import to Vertex RAG (async polling)
    const operation = await this.vertexRag.importFiles(changedFiles);
    await this.pollUntilComplete(operation); // ~5 minutes

    // 6. Update database tracking
    await this.db.updateSyncState(changedFiles);
  }
}

Seven database tables tracked sync states, file hashes, import operations, repository metadata, and more. Three-tier optimization to avoid unnecessary uploads. Async polling to monitor import completion.

// What it became
class DocSyncService {
  async syncRepository(repo: Repository) {
    // 1. Fetch files from GitHub
    const files = await this.github.getFiles(repo);

    // 2. Upload directly to file search store
    await Promise.all(
      files.map(f => this.gemini.fileSearchStores.uploadToFileSearchStore({
        file: f.content,
        metadata: { repo: repo.name, path: f.path }
      }))
    );

    // 3. Wait for completion
    await this.pollUntilComplete();
  }
}

No intermediate storage. No hash tracking. No three-tier optimization. The file search store handles deduplication internally. Database needs shrink from seven tables to two or three for basic tracking.

Phil Schmid’s implementation guide shows the practical reality: about 300-400 lines of code replace the 1,240 I had written.

The Pricing Reality: Same Cost, Less Complexity

This is the part that surprised me. I expected a managed service to cost more. It doesn’t.

Current Cost (Vertex AI RAG)

Based on my actual usage:

ComponentVolumeCost
Embeddings (indexing)~50K tokens/month~$0.01/month
Storage (GCS)~3.3MB average~$0.00/month
Retrieval queries~7.7M tokens/monthIncluded in model costs
Total~$0.01/month + inference

According to the pricing docs:

ComponentVolumeCost
Embeddings (indexing)~50K tokens/month~$0.01/month
Storage~3.3MB averageFree
Retrieval queries~7.7M tokens/monthIncluded in model costs
Total~$0.01/month + inference

Identical pricing. The managed service costs the same as rolling your own.

But the real savings aren’t on the cloud bill. They’re in:

  • Development time: Weeks of building vs. hours of integration
  • Maintenance burden: No more monitoring three-tier sync logic
  • Operational complexity: Fewer moving parts to debug at 3 AM
  • Context switching: One service instead of coordinating three

The cloud costs are a rounding error. The engineering costs are real.

What You Lose in the Migration

File Search isn’t better at everything. Here’s what I gave up:

Custom chunking logic: My Vertex RAG setup could implement sophisticated chunking strategies. File Search gives you configurable token limits and overlap, but you can’t plug in custom algorithms.

Fine-grained access control: GCS buckets integrate with IAM at a granular level. File Search uses project-level permissions.

Multi-cloud flexibility: I was locked into Google Cloud either way, but custom GCS storage could theoretically be mirrored elsewhere. File Search is exclusively Google infrastructure.

Debug visibility: When something went wrong in my custom pipeline, I could inspect every step. File Search is a black box. If indexing fails, you get less diagnostic information.

For my use case (customer support docs for a small SaaS), none of these mattered. Your mileage may vary.

When to Use File Search vs. Custom RAG

File Search makes sense when:

  • You’re building on Google Cloud already
  • Your chunking needs are standard (token limits, overlap)
  • You want to ship faster and maintain less code
  • Your document volume fits within project limits (1GB-1TB depending on tier)

Build your own RAG pipeline when:

  • You need custom chunking algorithms (semantic, citation-aware, etc.)
  • You’re already invested in specific vector databases
  • You require multi-cloud architecture
  • You need fine-grained access control at the document level
  • Your scale exceeds File Search limits (20GB per store recommended)

Lessons Learned

I’m not sure how to feel about this. Part of me is relieved. The codebase gets simpler, I have less infrastructure to maintain, and the cost stays the same. That’s a clear win.

But another part of me remembers the weeks spent building that three-tier optimization system. The careful benchmarking of different chunking strategies. The satisfaction of seeing that 100% success rate in the cron logs.

That work wasn’t wasted. I learned a lot building it. But it was also over-engineering for the problem I actually had.

The hard part of software engineering isn’t writing code. It’s knowing when not to.

Google’s File Search API is good enough for most RAG use cases. That’s simultaneously disappointing and freeing. Disappointing because the interesting problem is solved. Freeing because I can focus on the parts of my product that actually differentiate it.

Sometimes the best architecture is the one you don’t have to build.


FAQ

Q: Can I migrate existing Vertex RAG pipelines to File Search?

Yes, but you’ll need to re-upload and re-index your documents. The underlying storage format is different. Budget time for the migration and testing, not just the code changes.

Q: Does File Search support the same file formats as the Files API?

File Search supports PDFs, Word docs, spreadsheets, code files, and numerous text formats. Check the official documentation for the complete list.

Q: What about the 48-hour file expiration in the regular Files API?

That’s the key difference. Documents uploaded to File Search stores persist indefinitely until you explicitly delete them. This makes them suitable for production RAG systems, not just prototypes.

Q: How many File Search stores can I create?

Current limit is 10 stores per project. Google recommends keeping individual stores under 20GB for optimal retrieval performance.

Q: Is the retrieval quality the same as custom Vertex RAG?

In my testing, leveraging Google’s ADK, yes. Both use similar embedding models and vector search techniques. The quality difference comes from your chunking strategy and document quality, not the infrastructure.