03 - RAG (Retrieval-Augmented Generation)

Level: General to Advanced
Format: Prose + code + Q&A pairs tagged [Easy] [Medium] [Hard]

What You Will Learn

Why LLMs hallucinate and how RAG addresses it (and its limits)
How embeddings and vector databases enable semantic search - the math and the algorithms
Chunking and indexing strategies that determine retrieval quality
Dense, sparse, hybrid retrieval - BM25 formula, RRF, reranking mechanics
Advanced patterns: Self-RAG, GraphRAG, FLARE, CRAG, Agentic RAG
How to evaluate RAG systems with RAGAS and how to debug failures
The complete taxonomy of RAG types from Naive → GraphRAG
Production system design for 10M-100M document corpora
Vertex AI Search, RAG Engine, Grounding API - when to use which
Deploying, monitoring, and scaling RAG in production on GCP
LLM-specific failure modes: hallucination, lost-in-the-middle, prompt injection
80+ curated Q&A pairs covering all weak spots

Chapter Map

#	File	Topic	Difficulty
1	RAG Fundamentals	What RAG is, pipeline, RAG vs fine-tuning, naive RAG limitations	★★☆
2	Embeddings and Vector Stores	Embedding math, cosine vs dot vs euclidean, HNSW/IVF/PQ, vector DB comparison	★★★
3	Chunking and Indexing	Fixed/semantic/hierarchical chunking, parent-child, metadata filtering	★★☆
4	Retrieval Strategies	BM25 math, hybrid search, RRF formula, reranking, HyDE, MMR	★★★
5	RAG Types & Advanced Patterns	Naive→Modular→Agentic evolution; Self-RAG, FLARE, GraphRAG, CRAG, Multimodal	★★★
6	Evaluation and Failure Modes	RAGAS metrics/formulas, LLM-as-Judge, tracing, A/B testing	★★★
7	RAG System Design	Production architecture, latency budgets, caching, scaling 100M docs	★★★
8	Vertex AI RAG	Vertex AI Search, RAG Engine, Grounding API, Vector Search, AlloyDB	★★★
9	Production Deployment	Cloud Run vs GKE, monitoring, CI/CD eval gates, security, PII	★★★
10	Scaling and LLM Issues	Hallucination types, lost-in-the-middle, context management, prompt injection	★★★
11	Q&A Review Bank	80+ Q&A pairs: fundamentals → system design → GCP → production debugging	★★★

System Designs

End-to-end production system designs with GCP service mapping, scalability analysis, and interview-style Q&A.

#	Design	Pattern	Difficulty
SD-1	Simple RAG Pipeline	Vector + BM25 hybrid, reranking, semantic cache	★★★
SD-2	Agentic RAG - Hybrid Vector + Graph	ReAct agent, Spanner Graph, multi-hop reasoning	★★★★

Recommended Learning Paths

Path A: First time through (full learning)

RAG Types & Advanced Patterns - mental map first (taxonomy section)
RAG Fundamentals - why RAG exists
Embeddings and Vector Stores - the retrieval engine
Chunking and Indexing - the quality bottleneck
Retrieval Strategies - improve recall and precision
RAG Types & Advanced Patterns - production-grade patterns (full read)
Evaluation and Failure Modes - measure and debug
RAG System Design - interview system design depth
Vertex AI RAG - GCP-specific knowledge
Scaling and LLM Issues - tricky edge cases
Production Deployment - real-world ops
Q&A Review Bank - final consolidation

Path B: Accelerated Deep Dive (4–5 hours)

RAG Types & Advanced Patterns - taxonomy + GraphRAG, Self-RAG (60 min)
Retrieval Strategies - BM25 + hybrid math (45 min)
RAG System Design - system design depth (60 min)
Vertex AI RAG - GCP specifics (30 min)
Q&A Review Bank - all 80 Q&A pairs (90 min)

Path C: Weak spots only

System design → 08-RAG-System-Design.md
Retrieval math → 04-Retrieval-Strategies.md
Advanced patterns + taxonomy → 05-RAG-Types-and-Advanced-Patterns.md
Vertex AI / GCP → 09-Vertex-AI-RAG.md

RAG is fundamentally a prompt construction strategy - you retrieve context and inject it into a prompt. You need solid prompting foundations (Topic 02) before tackling RAG. You also need to understand embeddings, which require understanding how LLMs represent text (Topic 01).

Why RAG Comes Before MCP and Agents

RAG solves the knowledge problem for a single-turn pipeline. MCP generalizes this into a protocol. Agents generalize it further into autonomous retrieval decisions. RAG is the stepping stone.

Previous: 02 - Prompt Engineering | Next: 04 - MCP

Overview