03 — RAG (Retrieval-Augmented Generation)
Level: General to Advanced
Format: Prose + code + Q&A pairs tagged [Easy] [Medium] [Hard]
What You Will Learn
- Why LLMs hallucinate and how RAG addresses it (and its limits)
- How embeddings and vector databases enable semantic search — the math and the algorithms
- Chunking and indexing strategies that determine retrieval quality
- Dense, sparse, hybrid retrieval — BM25 formula, RRF, reranking mechanics
- Advanced patterns: Self-RAG, GraphRAG, FLARE, CRAG, Agentic RAG
- How to evaluate RAG systems with RAGAS and how to debug failures
- The complete taxonomy of RAG types from Naive → GraphRAG
- Production system design for 10M-100M document corpora
- Vertex AI Search, RAG Engine, Grounding API — when to use which
- Deploying, monitoring, and scaling RAG in production on GCP
- LLM-specific failure modes: hallucination, lost-in-the-middle, prompt injection
- 80+ curated Q&A pairs covering all weak spots
Chapter Map
| # | File | Topic | Difficulty |
|---|---|---|---|
| 1 | RAG Fundamentals | What RAG is, pipeline, RAG vs fine-tuning, naive RAG limitations | ★★☆ |
| 2 | Embeddings and Vector Stores | Embedding math, cosine vs dot vs euclidean, HNSW/IVF/PQ, vector DB comparison | ★★★ |
| 3 | Chunking and Indexing | Fixed/semantic/hierarchical chunking, parent-child, metadata filtering | ★★☆ |
| 4 | Retrieval Strategies | BM25 math, hybrid search, RRF formula, reranking, HyDE, MMR | ★★★ |
| 5 | RAG Types & Advanced Patterns | Naive→Modular→Agentic evolution; Self-RAG, FLARE, GraphRAG, CRAG, Multimodal | ★★★ |
| 6 | Evaluation and Failure Modes | RAGAS metrics/formulas, LLM-as-Judge, tracing, A/B testing | ★★★ |
| 7 | RAG System Design | Production architecture, latency budgets, caching, scaling 100M docs | ★★★ |
| 8 | Vertex AI RAG | Vertex AI Search, RAG Engine, Grounding API, Vector Search, AlloyDB | ★★★ |
| 9 | Production Deployment | Cloud Run vs GKE, monitoring, CI/CD eval gates, security, PII | ★★★ |
| 10 | Scaling and LLM Issues | Hallucination types, lost-in-the-middle, context management, prompt injection | ★★★ |
| 11 | Q&A Review Bank | 80+ Q&A pairs: fundamentals → system design → GCP → production debugging | ★★★ |
System Designs
End-to-end production system designs with GCP service mapping, scalability analysis, and interview-style Q&A.
| # | Design | Pattern | Difficulty |
|---|---|---|---|
| SD-1 | Simple RAG Pipeline | Vector + BM25 hybrid, reranking, semantic cache | ★★★ |
| SD-2 | Agentic RAG — Hybrid Vector + Graph | ReAct agent, Spanner Graph, multi-hop reasoning | ★★★★ |
Recommended Learning Paths
Path A: First time through (full learning)
- RAG Types & Advanced Patterns — mental map first (taxonomy section)
- RAG Fundamentals — why RAG exists
- Embeddings and Vector Stores — the retrieval engine
- Chunking and Indexing — the quality bottleneck
- Retrieval Strategies — improve recall and precision
- RAG Types & Advanced Patterns — production-grade patterns (full read)
- Evaluation and Failure Modes — measure and debug
- RAG System Design — interview system design depth
- Vertex AI RAG — GCP-specific knowledge
- Scaling and LLM Issues — tricky edge cases
- Production Deployment — real-world ops
- Q&A Review Bank — final consolidation
Path B: Accelerated Deep Dive (4–5 hours)
- RAG Types & Advanced Patterns — taxonomy + GraphRAG, Self-RAG (60 min)
- Retrieval Strategies — BM25 + hybrid math (45 min)
- RAG System Design — system design depth (60 min)
- Vertex AI RAG — GCP specifics (30 min)
- Q&A Review Bank — all 80 Q&A pairs (90 min)
Path C: Weak spots only
- System design → 08-RAG-System-Design.md
- Retrieval math → 04-Retrieval-Strategies.md
- Advanced patterns + taxonomy → 05-RAG-Types-and-Advanced-Patterns.md
- Vertex AI / GCP → 09-Vertex-AI-RAG.md
Why RAG Comes After Prompts
RAG is fundamentally a prompt construction strategy — you retrieve context and inject it into a prompt. You need solid prompting foundations (Topic 02) before tackling RAG. You also need to understand embeddings, which require understanding how LLMs represent text (Topic 01).
Why RAG Comes Before MCP and Agents
RAG solves the knowledge problem for a single-turn pipeline. MCP generalizes this into a protocol. Agents generalize it further into autonomous retrieval decisions. RAG is the stepping stone.