Skip to content

Week 8: Implementing the Planning Agent

Phase 3Week 8AdvancedLecture: 2026-04-21

Concepts

Explain why the planner is the bottleneck of a 9-phase SDLC and classify the design decisions a worker alone cannot absorb.

Design

Author a spec.md schema (objective · scope · constraints · acceptance · escalation) and define its validation gate.

Implementation

Implement the planner with codebase-analysis and spec-generation as separate stages, and automate spec validation.

Operations

Detect spec drift (requirements vs implementation) and run a fix-flow checklist when it appears.


Why the Planner Agent Is the Pipeline Bottleneck

Section titled “Why the Planner Agent Is the Pipeline Bottleneck”

Recall the 9-phase agentic SDLC we designed in Week 7. Phases 1–3 (requirements gathering → architecture design → task decomposition) fall entirely within the planner’s domain. The quality of the artifacts produced in these three phases determines the success rate of Phases 4–9 as a whole.

Intuitively: no matter how capable the coder agent is, if the spec.md it receives contains nothing more than “add user authentication” in a single line, it cannot generate correct code. Conversely, when acceptance criteria are clear and testable, even a small model can generate adequate code.

MetaGPT (ICLR 2024) demonstrates this empirically. Across the PM → Architect → Engineer sequence, it reports a correlation of 0.72 between the quality of SOP (Standard Operating Procedure) documents generated by each role and the quality of the final code. Notably, the clarity of the PRD (Product Requirements Document) written by the PM role was the strongest predictor.

The /spec skill in sdlc-toolkit implements this insight. Before processing a new requirement, it greps existing LESSON files in the project to inject past failure patterns into context. Not repeating the same mistakes — that is the difference between a deterministic pipeline and a simple LLM call.


The planner agent transforms vague human requirements into concrete tasks that a coder agent can process. This transformation is not a single step — it is implemented as a 2-stage separation pattern.

2-Stage Separation Pattern: /spec/architect

sdlc-toolkit intentionally separates requirements specification from architecture design. This separation applies the Separation of Concerns principle from software engineering to agent design.

  • /spec“What”: requirements, acceptance criteria, constraints, and out-of-scope items. Implementation details are not mentioned.
  • /architect“How”: architecture decisions, module decomposition, TASK file generation, and dependency DAG construction.

The practical benefit of this separation: during the spec phase, the LLM can focus on the completeness of requirements without getting buried in implementation details. During the architect phase, it searches for the optimal structure given already-confirmed requirements.

Flow: Requirement SpecArchitecture Document + TASK files (with dependency DAG)

”Add user authentication”Vague human requirement
Planner Agent
spec.md
  • task-001: JWT token generation module (auth/jwt.py)
  • task-002: Password hashing utility (auth/hash.py)
  • task-003: Login API endpoint (api/auth.py:45-80)
  • task-004: Authentication middleware (middleware/auth.py)
Each task includes acceptance criteria

planner_agent.py
import os
import anthropic
import json
from pathlib import Path
SYSTEM_PROMPT = """You are a planner agent acting as a software architect.
Analyze the given requirements and codebase, then generate a concrete, actionable task list.
Output format: JSON
{
"tasks": [
{
"id": "task-XXX",
"description": "Specific implementation details",
"target_files": ["file paths"],
"dependencies": ["task-XXX"],
"acceptance_criteria": ["verification conditions"]
}
]
}"""
class PlannerAgent:
def __init__(self):
self.client = anthropic.Anthropic()
def analyze_codebase(self, project_root: Path) -> str:
"""Summarize codebase structure as text"""
structure = []
for f in project_root.rglob("*.py"):
structure.append(f"- {f.relative_to(project_root)}")
return "\n".join(structure)
def plan(self, requirement: str, project_root: Path) -> dict:
codebase = self.analyze_codebase(project_root)
response = self.client.messages.create(
model=os.getenv("ANTHROPIC_MODEL", "claude-sonnet-4-5"),
max_tokens=4096,
system=SYSTEM_PROMPT,
messages=[{
"role": "user",
"content": f"Requirement: {requirement}\n\nCodebase:\n{codebase}"
}]
)
return json.loads(response.content[0].text)
def generate_spec_md(self, plan: dict, output_path: Path):
"""Convert JSON plan to a Markdown specification document"""
with open(output_path, 'w') as f:
f.write("# Project Specification\n\n")
for task in plan['tasks']:
f.write(f"## {task['id']}: {task['description']}\n")
f.write(f"- Target files: {', '.join(task['target_files'])}\n")
f.write("- Acceptance criteria:\n")
for criterion in task['acceptance_criteria']:
f.write(f" - [ ] {criterion}\n")
f.write("\n")

Specification Validation and the Dependency DAG

Section titled “Specification Validation and the Dependency DAG”

After the basic implementation, two core capabilities must be added: automatic validation of specification quality and construction of the task dependency DAG.

def validate_spec(self, plan: dict) -> tuple[bool, list[str]]:
"""Automatically validate the quality of a generated specification"""
issues = []
for task in plan['tasks']:
# Validate completeness of acceptance criteria
if not task.get('acceptance_criteria'):
issues.append(f"{task['id']}: acceptance_criteria missing")
elif len(task['acceptance_criteria']) < 2:
issues.append(f"{task['id']}: at least 2 acceptance_criteria required")
# Check for presence of assumptions
if 'assumptions' not in task:
issues.append(f"{task['id']}: assumptions field missing")
# Check whether out-of-scope is stated (at spec level, not task level)
if 'out_of_scope' not in plan:
issues.append("spec-level out_of_scope not defined")
break # report only once
return len(issues) == 0, issues
def create_task_dag(self, plan: dict, output_dir: Path):
"""Generate TASK files and build the dependency DAG (implements Week 7 tier concept)"""
task_map = {t['id']: t for t in plan['tasks']}
# Tier calculation: tasks with no dependencies = tier 1,
# tasks with dependencies = max(dep_tier) + 1
tiers = {}
def get_tier(task_id: str) -> int:
if task_id in tiers:
return tiers[task_id]
task = task_map[task_id]
deps = task.get('dependencies', [])
if not deps:
tiers[task_id] = 1
else:
tiers[task_id] = max(get_tier(d) for d in deps) + 1
return tiers[task_id]
for task in plan['tasks']:
get_tier(task['id'])
# Generate TASK files
output_dir.mkdir(parents=True, exist_ok=True)
for task in plan['tasks']:
task_file = output_dir / f"{task['id']}.md"
with open(task_file, 'w') as f:
f.write(f"---\nid: {task['id']}\ntier: {tiers[task['id']]}\n")
f.write(f"dependencies: {task.get('dependencies', [])}\n---\n\n")
f.write(f"## {task['description']}\n\n")
f.write("### Acceptance Criteria\n")
for ac in task['acceptance_criteria']:
f.write(f"- [ ] {ac}\n")

The existing analyze_codebase() returns only a file list. The context a coder agent actually needs is far richer.

Let’s look at the 3-stage analysis pattern used by the /architect skill in sdlc-toolkit.

def analyze_structure(self, project_root: Path) -> dict:
"""Build file/directory/module tree"""
structure = {"files": [], "directories": set(), "modules": []}
for f in project_root.rglob("*.py"):
rel = f.relative_to(project_root)
structure["files"].append(str(rel))
structure["directories"].add(str(rel.parent))
# Directories containing __init__.py = Python packages
if (f.parent / "__init__.py").exists():
module = str(rel.parent).replace("/", ".")
if module not in structure["modules"]:
structure["modules"].append(module)
structure["directories"] = sorted(structure["directories"])
return structure

An analyze_codebase_full() method integrating all three stages summarizes each stage’s output to produce a compact representation for injection into the LLM. A structural summary — not the full source — increases context efficiency by more than 10×.


This is the /validate pattern for automatically verifying whether a generated spec.md is sufficient for a coder agent to execute. It implements the “3 retries → human escalation” flow designed in Week 7.

VALIDATION_CHECKLIST = [
("frontmatter_valid", lambda s: bool(s.get("title") and s.get("description"))),
("has_what_and_why", lambda s: "what" in s.get("description","").lower()
or "why" in s.get("description","").lower()),
("ac_specific", lambda s: all(
len(ac) > 20 and not ac.endswith("...")
for ac in s.get("acceptance_criteria", [])
)),
("no_impl_details", lambda s: not any(
kw in str(s.get("acceptance_criteria",""))
for kw in ["import", "def ", "class ", "```"]
)),
]
class SpecValidator:
def __init__(self, client: anthropic.Anthropic, max_retries: int = 3):
self.client = client
self.max_retries = max_retries
def validate(self, spec: dict) -> tuple[bool, list[str]]:
failures = []
for name, check_fn in VALIDATION_CHECKLIST:
try:
if not check_fn(spec):
failures.append(name)
except Exception as e:
failures.append(f"{name}: error({e})")
return len(failures) == 0, failures
def validate_with_retry(self, spec: dict, requirement: str) -> dict:
"""Auto-regenerate and validate. Human escalation after 3 failures."""
for attempt in range(self.max_retries):
passed, failures = self.validate(spec)
if passed:
return spec
print(f"[attempt {attempt+1}/{self.max_retries}] validation failed: {failures}")
# Re-request generation with failures as feedback
spec = self._regenerate_spec(spec, requirement, failures)
# All 3 attempts failed → human escalation
raise ValueError(
f"Spec auto-validation failed (3 attempts). Manual review required.\n"
f"Last failures: {failures}"
)
def _regenerate_spec(self, spec: dict, requirement: str,
failures: list[str]) -> dict:
feedback = "\n".join(f"- {f}" for f in failures)
response = self.client.messages.create(
model=os.getenv("ANTHROPIC_MODEL", "claude-sonnet-4-5"),
max_tokens=2048,
messages=[{
"role": "user",
"content": (
f"The following validation items failed for this specification:\n{feedback}\n\n"
f"Original requirement: {requirement}\n\n"
f"Revise the specification to pass validation. Output JSON."
)
}]
)
return json.loads(response.content[0].text)

Q1. Detecting Ambiguity

Where is the boundary between “ambiguous” and “specific” for acceptance criteria in a planner-generated spec.md? Can ambiguity be detected automatically? Propose a mechanical definition of ambiguity.

Q2. Benefits of Separation

sdlc-toolkit intentionally separates requirements specification (/spec) from architecture design (/architect). What are the benefits of this separation? What problems arise if the two stages are merged into one?

Q3. Context Rot Trade-offs

Analyze the trade-offs between injecting the entire codebase into context versus injecting a summary, from the perspective of Context Rot as covered in Week 5. At what point should you switch to a summarization strategy?

Q4. MetaGPT vs sdlc-toolkit

Compare MetaGPT’s PM role with the /spec skill in sdlc-toolkit. Which is more deterministic? What design principles increase determinism?


  1. Implement the Planner Agent — Extend the code above as a starting point

  2. Enhance Codebase Analysis — Go beyond a simple file list to analyze functions and classes

  3. spec.md Generation Pipeline — Apply it to a real project (the calculator from Lab 04)

  4. Validate the Generated spec.md — Verify that a Coder Agent can actually execute the tasks


  • Planner = the pipeline’s first gate. Specification quality determines overall success rate. Backed by empirical MetaGPT data (correlation coefficient 0.72).
  • 2-stage separation: /spec (what) → /architect (how). Applying the Separation of Concerns principle to agent design.
  • 3-stage codebase analysis: structure (file tree) → semantics (signatures, import graph) → context (conventions, patterns). Injecting the full code causes Context Rot.
  • Validation gate: automatic checklist (4 items) + 3 retries + human escalation. Code implementation of the Week 7 design.
  • Dependency DAG: the dependencies array in TASK files and tier calculation implement the parallelization design from Week 7. Tier 1 tasks can run concurrently.

MetaGPT (ICLR 2024)

SOP-based multi-agent framework. Empirical demonstration of the PM → Architect → Engineer sequence and the correlation between document quality and code quality. The key reference for planner design.

SWE-agent (Princeton, 2024)

Automated codebase exploration and planning. Presents methods for LLMs to effectively navigate file systems through Agent-Computer Interface (ACI) design.

Anthropic Claude Code Agent Tool

Sub-agent spawning, model routing, and worktree isolation. Real-world patterns for implementing planner-coder separation in production.

sdlc-toolkit /spec + /architect

Production implementation of the requirements-architecture separation pattern. A comprehensive reference including LESSON-file-based learning from past failures, the validate gate, and TASK DAG generation.


Submission deadline: 2026-04-28 23:59

Requirements:

  1. Working PlannerAgent implementation (planner_agent.py)
  2. Codebase analysis capability at the function/class level
  3. A spec.md generated by applying it to a real project
  4. End-to-end demonstration of extending the Lab 04 calculator project with the planner