Week 8: Implementing the Planning Agent

Phase 3Week 8AdvancedLecture: 2026-04-21

Theory

Learning Objectives

Concepts

Explain why the planner is the bottleneck of a 9-phase SDLC and classify the design decisions a worker alone cannot absorb.

Design

Author a spec.md schema (objective · scope · constraints · acceptance · escalation) and define its validation gate.

Implementation

Implement the planner with codebase-analysis and spec-generation as separate stages, and automate spec validation.

Operations

Detect spec drift (requirements vs implementation) and run a fix-flow checklist when it appears.

Why the Planner Agent Is the Pipeline Bottleneck

Recall the 9-phase agentic SDLC we designed in Week 7. Phases 1–3 (requirements gathering → architecture design → task decomposition) fall entirely within the planner’s domain. The quality of the artifacts produced in these three phases determines the success rate of Phases 4–9 as a whole.

Intuitively: no matter how capable the coder agent is, if the spec.md it receives contains nothing more than “add user authentication” in a single line, it cannot generate correct code. Conversely, when acceptance criteria are clear and testable, even a small model can generate adequate code.

MetaGPT (ICLR 2024) demonstrates this empirically. Across the PM → Architect → Engineer sequence, it reports a correlation of 0.72 between the quality of SOP (Standard Operating Procedure) documents generated by each role and the quality of the final code. Notably, the clarity of the PRD (Product Requirements Document) written by the PM role was the strongest predictor.

The /spec skill in sdlc-toolkit implements this insight. Before processing a new requirement, it greps existing LESSON files in the project to inject past failure patterns into context. Not repeating the same mistakes — that is the difference between a deterministic pipeline and a simple LLM call.

The Role of the Planner Agent

The planner agent transforms vague human requirements into concrete tasks that a coder agent can process. This transformation is not a single step — it is implemented as a 2-stage separation pattern.

2-Stage Separation Pattern: /spec → /architect

sdlc-toolkit intentionally separates requirements specification from architecture design. This separation applies the Separation of Concerns principle from software engineering to agent design.

/spec — “What”: requirements, acceptance criteria, constraints, and out-of-scope items. Implementation details are not mentioned.
/architect — “How”: architecture decisions, module decomposition, TASK file generation, and dependency DAG construction.

The practical benefit of this separation: during the spec phase, the LLM can focus on the completeness of requirements without getting buried in implementation details. During the architect phase, it searches for the optimal structure given already-confirmed requirements.

Flow: Requirement Spec → Architecture Document + TASK files (with dependency DAG)

”Add user authentication”Vague human requirement

↓Planner Agent

spec.md

task-001: JWT token generation module (auth/jwt.py)
task-002: Password hashing utility (auth/hash.py)
task-003: Login API endpoint (api/auth.py:45-80)
task-004: Authentication middleware (middleware/auth.py)

Each task includes acceptance criteria

Planner Agent Implementation

import os
import anthropic
import json
from pathlib import Path

SYSTEM_PROMPT = """You are a planner agent acting as a software architect.
Analyze the given requirements and codebase, then generate a concrete, actionable task list.

Output format: JSON
{
  "tasks": [
    {
      "id": "task-XXX",
      "description": "Specific implementation details",
      "target_files": ["file paths"],
      "dependencies": ["task-XXX"],
      "acceptance_criteria": ["verification conditions"]
    }
  ]
}"""

class PlannerAgent:
    def __init__(self):
        self.client = anthropic.Anthropic()

    def analyze_codebase(self, project_root: Path) -> str:
        """Summarize codebase structure as text"""
        structure = []
        for f in project_root.rglob("*.py"):
            structure.append(f"- {f.relative_to(project_root)}")
        return "\n".join(structure)

    def plan(self, requirement: str, project_root: Path) -> dict:
        codebase = self.analyze_codebase(project_root)

        response = self.client.messages.create(
            model=os.getenv("ANTHROPIC_MODEL", "claude-sonnet-4-5"),
            max_tokens=4096,
            system=SYSTEM_PROMPT,
            messages=[{
                "role": "user",
                "content": f"Requirement: {requirement}\n\nCodebase:\n{codebase}"
            }]
        )

        return json.loads(response.content[0].text)

    def generate_spec_md(self, plan: dict, output_path: Path):
        """Convert JSON plan to a Markdown specification document"""
        with open(output_path, 'w') as f:
            f.write("# Project Specification\n\n")
            for task in plan['tasks']:
                f.write(f"## {task['id']}: {task['description']}\n")
                f.write(f"- Target files: {', '.join(task['target_files'])}\n")
                f.write("- Acceptance criteria:\n")
                for criterion in task['acceptance_criteria']:
                    f.write(f"  - [ ] {criterion}\n")
                f.write("\n")

Specification Validation and the Dependency DAG

After the basic implementation, two core capabilities must be added: automatic validation of specification quality and construction of the task dependency DAG.

def validate_spec(self, plan: dict) -> tuple[bool, list[str]]:
    """Automatically validate the quality of a generated specification"""
    issues = []

    for task in plan['tasks']:
        # Validate completeness of acceptance criteria
        if not task.get('acceptance_criteria'):
            issues.append(f"{task['id']}: acceptance_criteria missing")
        elif len(task['acceptance_criteria']) < 2:
            issues.append(f"{task['id']}: at least 2 acceptance_criteria required")

        # Check for presence of assumptions
        if 'assumptions' not in task:
            issues.append(f"{task['id']}: assumptions field missing")

        # Check whether out-of-scope is stated (at spec level, not task level)
        if 'out_of_scope' not in plan:
            issues.append("spec-level out_of_scope not defined")
            break  # report only once

    return len(issues) == 0, issues


def create_task_dag(self, plan: dict, output_dir: Path):
    """Generate TASK files and build the dependency DAG (implements Week 7 tier concept)"""
    task_map = {t['id']: t for t in plan['tasks']}

    # Tier calculation: tasks with no dependencies = tier 1,
    # tasks with dependencies = max(dep_tier) + 1
    tiers = {}
    def get_tier(task_id: str) -> int:
        if task_id in tiers:
            return tiers[task_id]
        task = task_map[task_id]
        deps = task.get('dependencies', [])
        if not deps:
            tiers[task_id] = 1
        else:
            tiers[task_id] = max(get_tier(d) for d in deps) + 1
        return tiers[task_id]

    for task in plan['tasks']:
        get_tier(task['id'])

    # Generate TASK files
    output_dir.mkdir(parents=True, exist_ok=True)
    for task in plan['tasks']:
        task_file = output_dir / f"{task['id']}.md"
        with open(task_file, 'w') as f:
            f.write(f"---\nid: {task['id']}\ntier: {tiers[task['id']]}\n")
            f.write(f"dependencies: {task.get('dependencies', [])}\n---\n\n")
            f.write(f"## {task['description']}\n\n")
            f.write("### Acceptance Criteria\n")
            for ac in task['acceptance_criteria']:
                f.write(f"- [ ] {ac}\n")

Advanced Codebase Analysis

The existing analyze_codebase() returns only a file list. The context a coder agent actually needs is far richer.

Let’s look at the 3-stage analysis pattern used by the /architect skill in sdlc-toolkit.

def analyze_structure(self, project_root: Path) -> dict:
    """Build file/directory/module tree"""
    structure = {"files": [], "directories": set(), "modules": []}

    for f in project_root.rglob("*.py"):
        rel = f.relative_to(project_root)
        structure["files"].append(str(rel))
        structure["directories"].add(str(rel.parent))

        # Directories containing __init__.py = Python packages
        if (f.parent / "__init__.py").exists():
            module = str(rel.parent).replace("/", ".")
            if module not in structure["modules"]:
                structure["modules"].append(module)

    structure["directories"] = sorted(structure["directories"])
    return structure

import ast

def analyze_semantics(self, project_root: Path) -> dict:
    """Extract function/class signatures and the import graph"""
    symbols = {}

    for f in project_root.rglob("*.py"):
        rel = str(f.relative_to(project_root))
        try:
            tree = ast.parse(f.read_text())
        except SyntaxError:
            continue

        file_symbols = {"classes": [], "functions": [], "imports": []}

        for node in ast.walk(tree):
            if isinstance(node, ast.ClassDef):
                methods = [n.name for n in ast.walk(node)
                           if isinstance(n, ast.FunctionDef)]
                file_symbols["classes"].append(
                    {"name": node.name, "methods": methods}
                )
            elif isinstance(node, ast.FunctionDef) and node.col_offset == 0:
                args = [a.arg for a in node.args.args]
                file_symbols["functions"].append(
                    {"name": node.name, "args": args, "line": node.lineno}
                )
            elif isinstance(node, (ast.Import, ast.ImportFrom)):
                file_symbols["imports"].append(ast.dump(node))

        symbols[rel] = file_symbols

    return symbols

def analyze_context(self, project_root: Path) -> dict:
    """Extract existing conventions and architectural patterns"""
    context = {
        "naming_convention": None,
        "test_framework": None,
        "patterns": [],
        "entry_points": []
    }

    # Detect naming convention (snake_case vs camelCase)
    py_files = list(project_root.rglob("*.py"))
    snake_count = sum(1 for f in py_files if "_" in f.stem)
    context["naming_convention"] = (
        "snake_case" if snake_count > len(py_files) / 2 else "camelCase"
    )

    # Detect test framework
    if (project_root / "pytest.ini").exists() or \
       any(project_root.rglob("conftest.py")):
        context["test_framework"] = "pytest"
    elif any(project_root.rglob("test_*.py")):
        context["test_framework"] = "unittest"

    # Detect entry points
    for candidate in ["main.py", "app.py", "run.py", "__main__.py"]:
        if (project_root / candidate).exists():
            context["entry_points"].append(candidate)

    return context

An analyze_codebase_full() method integrating all three stages summarizes each stage’s output to produce a compact representation for injection into the LLM. A structural summary — not the full source — increases context efficiency by more than 10×.

The Specification Validation Gate

This is the /validate pattern for automatically verifying whether a generated spec.md is sufficient for a coder agent to execute. It implements the “3 retries → human escalation” flow designed in Week 7.

VALIDATION_CHECKLIST = [
    ("frontmatter_valid",    lambda s: bool(s.get("title") and s.get("description"))),
    ("has_what_and_why",     lambda s: "what" in s.get("description","").lower()
                                       or "why" in s.get("description","").lower()),
    ("ac_specific",          lambda s: all(
                                 len(ac) > 20 and not ac.endswith("...")
                                 for ac in s.get("acceptance_criteria", [])
                             )),
    ("no_impl_details",      lambda s: not any(
                                 kw in str(s.get("acceptance_criteria",""))
                                 for kw in ["import", "def ", "class ", "```"]
                             )),
]

class SpecValidator:
    def __init__(self, client: anthropic.Anthropic, max_retries: int = 3):
        self.client = client
        self.max_retries = max_retries

    def validate(self, spec: dict) -> tuple[bool, list[str]]:
        failures = []
        for name, check_fn in VALIDATION_CHECKLIST:
            try:
                if not check_fn(spec):
                    failures.append(name)
            except Exception as e:
                failures.append(f"{name}: error({e})")
        return len(failures) == 0, failures

    def validate_with_retry(self, spec: dict, requirement: str) -> dict:
        """Auto-regenerate and validate. Human escalation after 3 failures."""
        for attempt in range(self.max_retries):
            passed, failures = self.validate(spec)
            if passed:
                return spec

            print(f"[attempt {attempt+1}/{self.max_retries}] validation failed: {failures}")

            # Re-request generation with failures as feedback
            spec = self._regenerate_spec(spec, requirement, failures)

        # All 3 attempts failed → human escalation
        raise ValueError(
            f"Spec auto-validation failed (3 attempts). Manual review required.\n"
            f"Last failures: {failures}"
        )

    def _regenerate_spec(self, spec: dict, requirement: str,
                         failures: list[str]) -> dict:
        feedback = "\n".join(f"- {f}" for f in failures)
        response = self.client.messages.create(
            model=os.getenv("ANTHROPIC_MODEL", "claude-sonnet-4-5"),
            max_tokens=2048,
            messages=[{
                "role": "user",
                "content": (
                    f"The following validation items failed for this specification:\n{feedback}\n\n"
                    f"Original requirement: {requirement}\n\n"
                    f"Revise the specification to pass validation. Output JSON."
                )
            }]
        )
        return json.loads(response.content[0].text)

In-Class Discussion Questions

Q1. Detecting Ambiguity

Where is the boundary between “ambiguous” and “specific” for acceptance criteria in a planner-generated spec.md? Can ambiguity be detected automatically? Propose a mechanical definition of ambiguity.

Q2. Benefits of Separation

sdlc-toolkit intentionally separates requirements specification (/spec) from architecture design (/architect). What are the benefits of this separation? What problems arise if the two stages are merged into one?

Q3. Context Rot Trade-offs

Analyze the trade-offs between injecting the entire codebase into context versus injecting a summary, from the perspective of Context Rot as covered in Week 5. At what point should you switch to a summarization strategy?

Q4. MetaGPT vs sdlc-toolkit

Compare MetaGPT’s PM role with the /spec skill in sdlc-toolkit. Which is more deterministic? What design principles increase determinism?

Practicum

Implement the Planner Agent — Extend the code above as a starting point
Enhance Codebase Analysis — Go beyond a simple file list to analyze functions and classes
spec.md Generation Pipeline — Apply it to a real project (the calculator from Lab 04)
Validate the Generated spec.md — Verify that a Coder Agent can actually execute the tasks

Key Takeaways

Planner = the pipeline’s first gate. Specification quality determines overall success rate. Backed by empirical MetaGPT data (correlation coefficient 0.72).
2-stage separation: /spec (what) → /architect (how). Applying the Separation of Concerns principle to agent design.
3-stage codebase analysis: structure (file tree) → semantics (signatures, import graph) → context (conventions, patterns). Injecting the full code causes Context Rot.
Validation gate: automatic checklist (4 items) + 3 retries + human escalation. Code implementation of the Week 7 design.
Dependency DAG: the dependencies array in TASK files and tier calculation implement the parallelization design from Week 7. Tier 1 tasks can run concurrently.

Assignment

Lab 08: Planning Agent Implementation

Submission deadline: 2026-04-28 23:59

Requirements:

Working PlannerAgent implementation (planner_agent.py)
Codebase analysis capability at the function/class level
A spec.md generated by applying it to a real project
End-to-end demonstration of extending the Lab 04 calculator project with the planner