Contents

Rag

Overview

View as:

03 - RAG (Retrieval-Augmented Generation)

Level: General to Advanced
Format: Prose + code + Q&A pairs tagged [Easy] [Medium] [Hard]


What You Will Learn

  • Why LLMs hallucinate and how RAG addresses it (and its limits)
  • How embeddings and vector databases enable semantic search - the math and the algorithms
  • Chunking and indexing strategies that determine retrieval quality
  • Dense, sparse, hybrid retrieval - BM25 formula, RRF, reranking mechanics
  • Advanced patterns: Self-RAG, GraphRAG, FLARE, CRAG, Agentic RAG
  • How to evaluate RAG systems with RAGAS and how to debug failures
  • The complete taxonomy of RAG types from Naive → GraphRAG
  • Production system design for 10M-100M document corpora
  • Vertex AI Search, RAG Engine, Grounding API - when to use which
  • Deploying, monitoring, and scaling RAG in production on GCP
  • LLM-specific failure modes: hallucination, lost-in-the-middle, prompt injection
  • 80+ curated Q&A pairs covering all weak spots

Chapter Map

#FileTopicDifficulty
1RAG FundamentalsWhat RAG is, pipeline, RAG vs fine-tuning, naive RAG limitations★★☆
2Embeddings and Vector StoresEmbedding math, cosine vs dot vs euclidean, HNSW/IVF/PQ, vector DB comparison★★★
3Chunking and IndexingFixed/semantic/hierarchical chunking, parent-child, metadata filtering★★☆
4Retrieval StrategiesBM25 math, hybrid search, RRF formula, reranking, HyDE, MMR★★★
5RAG Types & Advanced PatternsNaive→Modular→Agentic evolution; Self-RAG, FLARE, GraphRAG, CRAG, Multimodal★★★
6Evaluation and Failure ModesRAGAS metrics/formulas, LLM-as-Judge, tracing, A/B testing★★★
7RAG System DesignProduction architecture, latency budgets, caching, scaling 100M docs★★★
8Vertex AI RAGVertex AI Search, RAG Engine, Grounding API, Vector Search, AlloyDB★★★
9Production DeploymentCloud Run vs GKE, monitoring, CI/CD eval gates, security, PII★★★
10Scaling and LLM IssuesHallucination types, lost-in-the-middle, context management, prompt injection★★★
11Q&A Review Bank80+ Q&A pairs: fundamentals → system design → GCP → production debugging★★★

System Designs

End-to-end production system designs with GCP service mapping, scalability analysis, and interview-style Q&A.

#DesignPatternDifficulty
SD-1Simple RAG PipelineVector + BM25 hybrid, reranking, semantic cache★★★
SD-2Agentic RAG - Hybrid Vector + GraphReAct agent, Spanner Graph, multi-hop reasoning★★★★

Path A: First time through (full learning)

  1. RAG Types & Advanced Patterns - mental map first (taxonomy section)
  2. RAG Fundamentals - why RAG exists
  3. Embeddings and Vector Stores - the retrieval engine
  4. Chunking and Indexing - the quality bottleneck
  5. Retrieval Strategies - improve recall and precision
  6. RAG Types & Advanced Patterns - production-grade patterns (full read)
  7. Evaluation and Failure Modes - measure and debug
  8. RAG System Design - interview system design depth
  9. Vertex AI RAG - GCP-specific knowledge
  10. Scaling and LLM Issues - tricky edge cases
  11. Production Deployment - real-world ops
  12. Q&A Review Bank - final consolidation

Path B: Accelerated Deep Dive (4–5 hours)

  1. RAG Types & Advanced Patterns - taxonomy + GraphRAG, Self-RAG (60 min)
  2. Retrieval Strategies - BM25 + hybrid math (45 min)
  3. RAG System Design - system design depth (60 min)
  4. Vertex AI RAG - GCP specifics (30 min)
  5. Q&A Review Bank - all 80 Q&A pairs (90 min)

Path C: Weak spots only


Why RAG Comes After Prompts

RAG is fundamentally a prompt construction strategy - you retrieve context and inject it into a prompt. You need solid prompting foundations (Topic 02) before tackling RAG. You also need to understand embeddings, which require understanding how LLMs represent text (Topic 01).

Why RAG Comes Before MCP and Agents

RAG solves the knowledge problem for a single-turn pipeline. MCP generalizes this into a protocol. Agents generalize it further into autonomous retrieval decisions. RAG is the stepping stone.


Previous: 02 - Prompt Engineering | Next: 04 - MCP