Planning and Reasoning
← Back to Overview: Agentic AI
Why Planning Matters
A purely reactive agent takes one step at a time, deciding what to do based only on the most recent observation. This works for simple tasks. It breaks for complex ones.
Consider the task: "Prepare a comprehensive competitive analysis of EV battery manufacturers for our board presentation next Tuesday."
A reactive agent starts immediately — perhaps searching for "EV battery manufacturers" — without considering: - What structure the analysis should have - Which manufacturers to cover and why - What data needs to be gathered for each manufacturer - How the steps depend on each other (you need the list of manufacturers before you can research each one) - What "comprehensive" and "board presentation" imply about depth and format
An agent with planning produces a structured plan first, validates it, then executes. The plan is the agent's theory of how to solve the problem. Execution tests that theory. Replanning corrects it when reality diverges.
The choice of planning strategy — whether to reason step-by-step, plan upfront, explore multiple approaches, or verify outputs — determines both the quality and cost of the result.
Chain of Thought (CoT)
What It Is
Chain of Thought is the simplest "planning" technique: prompt the model to reason through a problem step by step before giving an answer. The reasoning is not action — it's the model thinking out loud before committing.
Without CoT:
Q: "A store sells apples for $1.20 each and oranges for $0.80 each.
If I buy 3 apples and 5 oranges, how much do I spend?"
A: "$7.60" ← may be wrong; no visible reasoning
With CoT:
Q: (same question) "Let's think step by step."
A: "First, the cost of 3 apples: 3 × $1.20 = $3.60
Next, the cost of 5 oranges: 5 × $0.80 = $4.00
Total: $3.60 + $4.00 = $7.60" ← reasoning visible, answer grounded
Variants
Zero-shot CoT: Add "Let's think step by step" to the prompt. No examples needed. Effective for many tasks.
Few-shot CoT: Provide examples that include reasoning chains. The model imitates the reasoning style.
EXAMPLES = """
Q: If a train travels 120km in 2 hours, then 180km in 3 hours, what is its average speed?
A: Total distance = 120 + 180 = 300km. Total time = 2 + 3 = 5 hours. Average speed = 300/5 = 60 km/h.
Q: {user_question}
A: Let me work through this:
"""
Self-consistency: Generate multiple CoT reasoning chains (with higher temperature), take the majority answer. Improves accuracy on math and logic problems at the cost of multiple LLM calls.
def self_consistent_answer(question: str, num_samples: int = 5) -> str:
answers = []
for _ in range(num_samples):
response = llm.invoke(
question + "\nLet's think step by step.",
temperature=0.7
)
answers.append(extract_final_answer(response.content))
# Return majority vote
from collections import Counter
return Counter(answers).most_common(1)[0][0]
When to Use CoT
- Mathematical and logical reasoning tasks
- Multi-step problems where intermediate steps matter
- Tasks where you want visible reasoning for debugging or trust
- When the model consistently makes errors on a class of problem without explicit reasoning
Limitation: CoT is reasoning, not action. The model still cannot take actions, call tools, or observe external state. It's a thinking strategy, not an agent pattern.
ReAct (Reason + Act)
What It Is
ReAct is the most common agent reasoning pattern. It interleaves explicit reasoning traces with tool calls: Thought → Action → Observation → Thought → ...
Thought: The user wants to know the current CEO of Anthropic and their background.
I should search for this.
Action: search_web("Anthropic CEO 2024")
Observation: Dario Amodei is the CEO of Anthropic. He was previously VP of Research at OpenAI.
Thought: I have the CEO's name and background. The user asked about educational background too.
Let me check for more detail.
Action: search_web("Dario Amodei education background")
Observation: Dario Amodei has a PhD in computational neuroscience from Princeton University.
Thought: I now have both the position and educational background. I can answer.
Action: [FINAL ANSWER]
"Dario Amodei is the CEO of Anthropic. He holds a PhD in computational neuroscience from
Princeton University and was previously VP of Research at OpenAI before co-founding Anthropic."
The "Thought" lines are not tool calls — they are the LLM reasoning out loud about what it knows and what it needs next. This makes the agent's logic visible and gives it the chance to self-correct before taking an action.
ReAct System Prompt
REACT_SYSTEM_PROMPT = """
You are a helpful assistant with access to tools.
For every task, follow this exact format:
Thought: Reason about the current situation and what you need to do next.
Action: Call the appropriate tool.
Observation: (This will be filled in with the tool result)
Thought: Reason about the observation and what to do next.
...repeat until done...
When you have enough information to answer, write:
Thought: I now have all the information needed to answer.
Final Answer: [your complete response to the user]
Rules:
- Always write a Thought before every Action
- Never make up tool results — wait for the real Observation
- If a tool call fails, reason about the failure and try a different approach
- Never call the same tool with identical arguments twice
"""
Why ReAct Works
-
Prevents premature action: Writing out "Thought: I need X, so I will call tool Y with parameter Z" catches errors before execution. The LLM often corrects its own reasoning in the Thought step.
-
Maintains goal awareness: Each Thought step reconnects the agent to what it is ultimately trying to accomplish. Without this, agents drift toward solving the most recent subproblem and forget the original task.
-
Makes debugging possible: When the agent produces a wrong answer, you can read the Thought steps to find exactly where the reasoning went wrong — often before any tool was even called.
ReAct Failure Modes
Thought without grounding
Thought: The CEO is probably Sarah Johnson. ← fabrication
Action: [FINAL ANSWER] "The CEO is Sarah Johnson"
Circular reasoning
Thought: I need to find X. I'll search for X.
Action: search("X")
Observation: No results.
Thought: I need to find X. I'll search for X. ← same thought, same action
(thought, action) pairs. If the same pair repeats, force a new strategy: "You've tried this search. Try a different query or approach."
Over-thinking without acting
Thought: I need to find X. But first I should think about how to find X.
To do that, I should think about what X is. X is related to...
[20 lines of reasoning without any tool call]
Plan-and-Execute
What It Is
Plan-and-Execute separates planning from execution into two distinct phases. Phase 1: the agent produces a complete, structured plan. Phase 2: the plan is executed step by step. If a step fails, the planner is re-invoked to update the remaining plan.
# Phase 1: Planning
PLANNER_PROMPT = """
You are a planning agent. Given a goal, create a detailed execution plan.
Goal: {goal}
Produce a numbered list of concrete, executable steps. Each step should:
- Be specific enough that an agent can execute it without further clarification
- Have a clear, observable outcome
- Be ordered correctly (later steps may depend on earlier ones)
- Be achievable with the available tools: {tool_list}
Format each step as:
Step N: [action description]
Expected outcome: [what success looks like for this step]
Depends on: [step numbers that must complete first, or "none"]
"""
# Phase 2: Execution
EXECUTOR_PROMPT = """
You are an execution agent. Execute the following step and report the result.
Overall goal: {goal}
Current step: {step_description}
Expected outcome: {expected_outcome}
Context from previous steps: {context}
Execute this step using the available tools. Report:
- Whether the step succeeded
- The result
- Any issues encountered
"""
Replanning on Failure
async def plan_and_execute(goal: str) -> str:
# Phase 1: Generate the plan
plan = await generate_plan(goal)
completed_steps = []
for step in plan.steps:
# Execute the step
result = await execute_step(step, context=completed_steps)
if result.success:
completed_steps.append({"step": step, "result": result.output})
else:
# Step failed — replan from here
updated_plan = await replan(
goal=goal,
completed=completed_steps,
failed_step=step,
failure_reason=result.error,
remaining_steps=plan.steps[plan.steps.index(step):]
)
plan.steps = plan.steps[:plan.steps.index(step)] + updated_plan.steps
return synthesize_results(goal, completed_steps)
When to Use Plan-and-Execute
| Use Plan-and-Execute When | Use ReAct Instead When |
|---|---|
| The full plan can be determined upfront | The right next step depends on the previous result |
| HITL review of the plan before execution is desired | Dynamic discovery is the core of the task |
| Steps have stable dependencies | Adaptation is more important than planning |
| Tracking progress against a plan matters to users | Speed is the primary concern |
| The task is a known workflow (research → write → review) | The task structure is unknown upfront |
HITL Review of the Plan
One of Plan-and-Execute's biggest advantages: you can show the human the plan and get approval before any action is taken.
async def plan_review_execute(goal: str) -> str:
plan = await generate_plan(goal)
# Present plan to human before execution
approval = await request_hitl_approval(
action_description="Execute the following plan",
plan_preview=plan.to_markdown(),
consequences="This will make real API calls and may take 5-10 minutes."
)
if approval.decision == "rejected":
return f"Plan rejected: {approval.reason}"
if approval.decision == "modified":
plan = parse_modified_plan(approval.modified_plan)
return await execute_plan(plan)
Tree of Thoughts (ToT)
What It Is
Tree of Thoughts models reasoning as a search tree. Instead of committing to a single line of reasoning, the agent explores multiple branches, evaluates each, and selects the most promising path.
Goal: Design a system for real-time fraud detection
[Goal]
/ \
Branch A Branch B
(Rule-based system) (ML-based system)
/ \ / \
Branch A1 A2 Branch B1 B2
(Fast, rigid) (Slow, flexible) (Streaming) (Batch)
Evaluate each branch:
A1: score 5/10 (fast but brittle, can't adapt to new patterns)
A2: score 4/10 (too slow for real-time)
B1: score 8/10 (best fit: ML model + streaming, adapts over time)
B2: score 6/10 (ML is right, but batch doesn't meet real-time req)
Select B1 → continue expanding from there
BFS vs DFS for ToT
Breadth-First Search (BFS): Explore all options at one level before going deeper. Best when you want to compare alternatives at the same level of abstraction before committing.
Depth-First Search (DFS): Explore each branch fully before trying the next. Best when you have a strong prior that one branch is correct and want to verify it quickly.
Beam Search: Keep the top-K branches at each level (K = beam width). Balances exploration with efficiency. Most common in practice.
async def tree_of_thoughts(
problem: str,
branching_factor: int = 3,
max_depth: int = 3,
beam_width: int = 2
) -> str:
# Initialize: generate branching_factor initial thoughts
thoughts = await generate_thoughts(problem, n=branching_factor)
for depth in range(max_depth):
# Evaluate all current thoughts
scored_thoughts = []
for thought in thoughts:
score = await evaluate_thought(thought, problem)
scored_thoughts.append((score, thought))
# Keep top beam_width thoughts
scored_thoughts.sort(reverse=True)
thoughts = [t for _, t in scored_thoughts[:beam_width]]
# Check if any thought is a final answer
for thought in thoughts:
if is_final_answer(thought):
return thought.answer
# Expand: generate next thoughts from each surviving branch
next_thoughts = []
for thought in thoughts:
expansions = await generate_thoughts(thought, n=branching_factor)
next_thoughts.extend(expansions)
thoughts = next_thoughts
# Return best answer found
return max(thoughts, key=lambda t: t.score).answer
When ToT Is Worth the Cost
ToT is expensive: generating N branches per step multiplies LLM calls by N, and evaluating each branch adds more. Use it only when:
- The problem has multiple non-obvious solution approaches
- Choosing the wrong approach early leads to dead ends
- Quality is far more important than speed and cost
- The problem is "creative" (writing, design, strategy) rather than factual retrieval
ToT is overkill for most information-retrieval tasks. It excels for complex design problems, strategic planning, and creative generation.
FLARE (Forward-Looking Active Retrieval)
What It Is
FLARE (Forward-Looking Active Retrieval Enhanced Generation) solves a specific problem: generating long documents where the LLM needs to retrieve information at multiple points, not just once at the start.
In standard RAG, retrieval happens once before generation. For a long document, the retrieved context may be relevant for the first few paragraphs but not the later ones.
FLARE retrieves iteratively: generate a bit, detect where you're uncertain, retrieve relevant information for that uncertainty, continue generating.
Task: "Write a detailed report on the current state of quantum computing"
Standard RAG:
1. Retrieve documents about quantum computing (once)
2. Generate the entire report from that context
Problem: Context retrieved at step 1 may not cover specific sections (e.g., quantum error correction details)
FLARE:
1. Generate paragraph 1 (introduction — no retrieval needed)
2. Detect uncertainty: "I need to write about the current state of IBM's quantum systems..."
3. Retrieve: search("IBM quantum computer 2024 qubits")
4. Generate paragraph 2 using retrieved context
5. Continue generating paragraph 3
6. Detect uncertainty: "I should discuss recent breakthroughs in error correction..."
7. Retrieve: search("quantum error correction recent advances 2024")
8. Generate paragraph 3 using new context
9. Continue...
Implementation
def flare_generate(task: str, retrieval_threshold: float = 0.5) -> str:
output_so_far = ""
while not is_complete(output_so_far, task):
# Generate the next sentence or paragraph
next_chunk = llm.generate(
prompt=f"Task: {task}\n\nSo far:\n{output_so_far}\n\nContinue:",
max_tokens=200
)
# Estimate confidence (look for hedging language or explicit uncertainty)
confidence = estimate_generation_confidence(next_chunk)
if confidence < retrieval_threshold:
# Low confidence: retrieve before committing to this chunk
query = generate_retrieval_query(task, output_so_far, next_chunk)
retrieved_context = retrieve(query)
# Regenerate the chunk with the retrieved context
next_chunk = llm.generate(
prompt=f"Task: {task}\n\nContext:\n{retrieved_context}\n\nSo far:\n{output_so_far}\n\nContinue:",
max_tokens=200
)
output_so_far += next_chunk
return output_so_far
def estimate_generation_confidence(text: str) -> float:
# Simple heuristic: check for hedging language
hedging_phrases = ["I think", "I believe", "probably", "I'm not sure", "approximately", "around"]
hedge_count = sum(1 for phrase in hedging_phrases if phrase.lower() in text.lower())
return max(0.0, 1.0 - (hedge_count * 0.25))
Reflexion (Self-Evaluation + Revision)
What It Is
Reflexion is a pattern where the agent evaluates its own output (or a separate evaluator agent evaluates it), generates a verbal "reflection" about what went wrong, and uses that reflection to improve the next attempt. This is different from simple Reflection (covered in Design Patterns) in that the feedback is stored as a "verbal reinforcement signal" that persists across attempts.
Attempt 1:
Task: "Write a Python function to reverse a linked list"
Output: [code with a bug — doesn't handle single-node lists]
Test results: 3/5 tests pass
Reflection:
"The function fails when the list has a single node because I didn't check
for self.next being None before setting next.prev. I should add a base case
check at the start: if head is None or head.next is None, return head."
Attempt 2: [uses the reflection as additional context]
Output: [corrected code]
Test results: 5/5 tests pass
Implementation
class ReflexionAgent:
def __init__(self, max_attempts: int = 4):
self.max_attempts = max_attempts
self.reflections = []
def solve(self, task: str) -> str:
for attempt in range(self.max_attempts):
# Build context with accumulated reflections
context = task
if self.reflections:
context += "\n\nLessons from previous attempts:\n" + "\n".join(
f"- Attempt {i+1}: {r}" for i, r in enumerate(self.reflections)
)
# Generate solution
solution = llm.invoke(context).content
# Evaluate
eval_result = self.evaluate(task, solution)
if eval_result.success:
return solution
# Generate reflection for next attempt
reflection = self.generate_reflection(
task=task,
solution=solution,
failure_reason=eval_result.failure_reason,
test_results=eval_result.test_results
)
self.reflections.append(reflection)
return solution # return best attempt after max attempts
def generate_reflection(self, task: str, solution: str,
failure_reason: str, test_results: dict) -> str:
reflection_prompt = f"""
Task: {task}
My solution: {solution}
Failure reason: {failure_reason}
Test results: {test_results}
What specifically did I do wrong? What should I do differently next time?
Be specific and actionable. Focus on what to change, not what was right.
"""
return llm.invoke(reflection_prompt).content
Goal Decomposition
Why Decomposition Is Hard
The right decomposition of a goal into subtasks is not obvious. Bad decomposition leads to: - Over-decomposition: so many tiny tasks that coordination overhead dominates - Under-decomposition: subtasks that are still too complex for a single agent - Wrong seams: splitting at artificial boundaries instead of natural ones - Hidden dependencies: step 3 needs the output of step 1, not step 2, but this isn't obvious
Decomposition Strategies
Functional decomposition: Split by what each piece does (research, write, review). Natural for creative or analytical tasks.
Goal: "Write a market analysis report"
→ Research phase: gather data
→ Analysis phase: interpret data
→ Writing phase: produce report
→ Review phase: check quality
Object decomposition: Split by what entity each piece operates on. Natural for tasks that touch multiple independent entities.
Goal: "Summarize Q3 performance for all 5 regional teams"
→ Subtask 1: summarize Team A's Q3 performance
→ Subtask 2: summarize Team B's Q3 performance
→ ...parallel execution possible...
→ Subtask 6: synthesize all summaries
Dependency-first decomposition: Start by identifying what depends on what, then order from least to most dependent.
def decompose_with_dependencies(goal: str) -> list[Subtask]:
# Ask the LLM to identify tasks AND their dependencies
decomposition_prompt = f"""
Goal: {goal}
List the subtasks needed to complete this goal. For each subtask:
1. Give it a short ID (task_1, task_2, etc.)
2. Describe what it does
3. List which other tasks it depends on (must complete first)
Format as JSON:
[
{{"id": "task_1", "description": "...", "depends_on": []}},
{{"id": "task_2", "description": "...", "depends_on": ["task_1"]}},
...
]
"""
raw = llm.invoke(decomposition_prompt).content
subtasks_data = json.loads(extract_json(raw))
return [Subtask(**data) for data in subtasks_data]
Choosing the Right Strategy
| Strategy | Best For | Cost | Quality |
|---|---|---|---|
| Chain of Thought | Math, logic, step-by-step reasoning | Low (1 call) | Medium |
| ReAct | General-purpose agentic tasks with tools | Medium | Medium-High |
| Plan-and-Execute | Known workflows, HITL plan review | Medium-High | High |
| Tree of Thoughts | Complex design, strategy, creative problems | High (N× branches) | Very High |
| FLARE | Long document generation, research reports | Medium (proportional to length) | High |
| Reflexion | Code generation, iterative refinement | Medium (proportional to attempts) | High |
Decision rules:
- Start with ReAct — it handles the majority of tasks well
- Use Plan-and-Execute when you need HITL approval before execution or when the task is a known workflow
- Use Tree of Thoughts when the problem has multiple non-obvious solution approaches and quality outweighs cost
- Use FLARE when generating long documents that require retrieval at multiple points
- Use Reflexion when output quality on the first attempt is consistently insufficient and there's a clear evaluable quality criterion
- Use CoT as a supporting technique inside any of the above when pure reasoning steps are needed
Dynamic Replanning
Even well-designed plans fail when reality doesn't match assumptions. A robust agent detects failure and updates the plan without starting over.
class PlanExecutor:
def __init__(self, max_replan_attempts: int = 3):
self.max_replan_attempts = max_replan_attempts
async def execute(self, goal: str, initial_plan: Plan) -> Result:
plan = initial_plan
completed = []
replan_count = 0
while not plan.is_complete() and replan_count < self.max_replan_attempts:
next_steps = plan.next_executable()
for step in next_steps:
result = await self.execute_step(step)
if result.success:
completed.append({"step": step, "output": result.output})
plan.mark_done(step.id, result.output)
else:
replan_count += 1
# Replan: update remaining steps given the failure
updated_remaining = await self.replan(
goal=goal,
completed=completed,
failed_step=step,
failure_reason=result.error
)
plan.replace_remaining(updated_remaining)
break # restart the step loop with new plan
if not plan.is_complete():
return Result.partial(completed, error="Replanning attempts exhausted")
return Result.success(synthesize(goal, completed))
async def replan(self, goal: str, completed: list,
failed_step: Subtask, failure_reason: str) -> list[Subtask]:
replan_prompt = f"""
Goal: {goal}
Completed steps:
{format_completed(completed)}
Failed step: {failed_step.description}
Failure reason: {failure_reason}
Given this failure, what is the best revised plan for the remaining work?
Preserve as much completed work as possible.
Account for the fact that {failed_step.description} failed and cannot be retried as-is.
"""
return parse_plan(llm.invoke(replan_prompt).content)
Study Notes
- ReAct is the default. Use it unless you have a specific reason for something more complex. It handles more tasks than you'd expect and is much easier to debug than elaborate planning strategies.
- Plan-and-Execute's hidden superpower is HITL. Being able to show a human the plan before executing anything — and getting approval or modification — is enormously valuable for high-stakes tasks. This alone often justifies the added complexity.
- Tree of Thoughts is a premium tool. The branching factor rapidly multiplies your LLM call count. Profile the cost before deploying. It's justified for one-time complex decisions, not for high-volume routine tasks.
- Replanning is different from retrying. Retrying is running the same step again. Replanning is reconsidering the remaining work given that the step failed. Retrying is for transient errors; replanning is for fundamental approach failures.
- Decomposition quality determines everything downstream. A bad decomposition — wrong seams, hidden dependencies, wrong granularity — will cause failures regardless of which reasoning strategy you use. Invest time in the decomposition prompt.
Q&A Review Bank
Q1: What is the difference between Chain of Thought and ReAct? [Easy]
A: Chain of Thought is a reasoning technique where the model thinks step by step before producing an answer — it's pure text generation, no tool calls, no external actions. ReAct (Reason + Act) is an agent pattern that interleaves explicit reasoning traces (Thought) with tool calls (Action) and tool results (Observation). CoT gives the model more reasoning steps within a single LLM call; ReAct enables the agent to take actions and observe results across multiple LLM calls in a loop. CoT improves answers to complex questions; ReAct enables agents to retrieve information, execute code, and interact with external systems. In practice, ReAct uses CoT-style reasoning within each Thought step.
Q2: When should you choose Plan-and-Execute over ReAct? [Medium]
A: Choose Plan-and-Execute when: (1) the full task structure can be determined upfront — you know what the steps are before executing any of them; (2) HITL review of the plan before execution is required — you need a human to see and approve the approach before any action is taken; (3) the task is a known workflow (research → write → review) where the structure is predictable; (4) transparency and progress tracking matter — a plan with checkable steps is easier to report to users. Use ReAct when: the right next step depends on the result of the previous step (dynamic discovery), the task structure is unknown upfront, or speed is the primary concern and overhead of planning is not justified.
Q3: What is Tree of Thoughts and what cost does it introduce? [Medium]
A: Tree of Thoughts models reasoning as a search tree — instead of committing to one reasoning path, the agent generates multiple candidate thoughts (branches), evaluates each, and continues from the most promising. This allows exploring multiple solution approaches before committing. The cost is multiplicative: if the branching factor is 3, every step requires 3× the LLM calls of a linear approach plus additional evaluation calls. Across 3 depth levels with branching factor 3, that's roughly 27 leaf nodes + 9 + 3 = 39 LLM calls vs 3 for linear. ToT is justified only when: the problem has non-obvious solution paths, choosing the wrong path leads to dead ends, and quality significantly outweighs cost.
Q4: What is FLARE and what specific problem does it solve that standard RAG does not? [Hard]
A: FLARE (Forward-Looking Active Retrieval Enhanced Generation) solves iterative retrieval for long document generation. Standard RAG retrieves documents once at the start and generates the entire output from that context. For short answers this works; for long reports or documents, the initial retrieval may be relevant for the first few sections but not later sections that need different facts. FLARE retrieves iteratively: it generates incrementally, detects when it's uncertain (hedging language, unknown specifics), generates a retrieval query for that specific uncertainty, retrieves new context, and continues generation. This ensures every section of a long output is grounded in relevant, retrieved context — not hallucinated from model weights or extrapolated from context retrieved for a different section.
Q5: What is the difference between retrying a failed step and replanning? [Medium]
A: Retrying executes the same step again with the same approach — appropriate for transient failures (network timeout, rate limit, temporary service outage) where the step itself is correct and the failure was environmental. Replanning reconsidering the remaining work given that a step failed — appropriate for fundamental approach failures where the step cannot succeed as designed (the tool doesn't exist, the data isn't available, the approach was wrong). Retrying a fundamentally broken step wastes time and money. Replanning rethinks the remaining subtasks to achieve the goal through a different path, preserving all work completed before the failure. A robust agent classifies failures before retrying: transient errors → retry with backoff; fundamental errors → replan.
Q6: Why is goal decomposition considered a critical skill and what are the three most common decomposition mistakes? [Hard]
A: Decomposition quality determines the quality of everything downstream — a bad decomposition leads to coordination overhead, execution failures, and replanning that could have been avoided. The three most common mistakes: (1) Wrong seams — splitting at artificial boundaries rather than natural ones (e.g., splitting "write paragraph 1" and "write paragraph 2" as separate subtasks when the agent can't write paragraph 2 without knowing how paragraph 1 ends; the natural seam is between research/writing/review phases). (2) Hidden dependencies — creating subtasks that depend on each other in ways not captured in the dependency graph, causing deadlocks or incorrect ordering at execution time; uncovering dependencies requires explicitly asking the model to reason about what each task needs as inputs. (3) Wrong granularity — subtasks that are either so granular they add coordination overhead without benefit, or so coarse that each subtask is still too complex for a single agent to complete reliably.