Concepts
Explain why the planner is the bottleneck of a 9-phase SDLC and classify the design decisions a worker alone cannot absorb.
Concepts
Explain why the planner is the bottleneck of a 9-phase SDLC and classify the design decisions a worker alone cannot absorb.
Design
Author a spec.md schema (objective · scope · constraints · acceptance · escalation) and define its validation gate.
Implementation
Implement the planner with codebase-analysis and spec-generation as separate stages, and automate spec validation.
Operations
Detect spec drift (requirements vs implementation) and run a fix-flow checklist when it appears.
Recall the 9-phase agentic SDLC we designed in Week 7. Phases 1–3 (requirements gathering → architecture design → task decomposition) fall entirely within the planner’s domain. The quality of the artifacts produced in these three phases determines the success rate of Phases 4–9 as a whole.
Intuitively: no matter how capable the coder agent is, if the spec.md it receives contains nothing more than “add user authentication” in a single line, it cannot generate correct code. Conversely, when acceptance criteria are clear and testable, even a small model can generate adequate code.
MetaGPT (ICLR 2024) demonstrates this empirically. Across the PM → Architect → Engineer sequence, it reports a correlation of 0.72 between the quality of SOP (Standard Operating Procedure) documents generated by each role and the quality of the final code. Notably, the clarity of the PRD (Product Requirements Document) written by the PM role was the strongest predictor.
The /spec skill in sdlc-toolkit implements this insight. Before processing a new requirement, it greps existing LESSON files in the project to inject past failure patterns into context. Not repeating the same mistakes — that is the difference between a deterministic pipeline and a simple LLM call.
The planner agent transforms vague human requirements into concrete tasks that a coder agent can process. This transformation is not a single step — it is implemented as a 2-stage separation pattern.
2-Stage Separation Pattern: /spec → /architect
sdlc-toolkit intentionally separates requirements specification from architecture design. This separation applies the Separation of Concerns principle from software engineering to agent design.
/spec — “What”: requirements, acceptance criteria, constraints, and out-of-scope items. Implementation details are not mentioned./architect — “How”: architecture decisions, module decomposition, TASK file generation, and dependency DAG construction.The practical benefit of this separation: during the spec phase, the LLM can focus on the completeness of requirements without getting buried in implementation details. During the architect phase, it searches for the optimal structure given already-confirmed requirements.
Flow: Requirement Spec → Architecture Document + TASK files (with dependency DAG)
import osimport anthropicimport jsonfrom pathlib import Path
SYSTEM_PROMPT = """You are a planner agent acting as a software architect.Analyze the given requirements and codebase, then generate a concrete, actionable task list.
Output format: JSON{ "tasks": [ { "id": "task-XXX", "description": "Specific implementation details", "target_files": ["file paths"], "dependencies": ["task-XXX"], "acceptance_criteria": ["verification conditions"] } ]}"""
class PlannerAgent: def __init__(self): self.client = anthropic.Anthropic()
def analyze_codebase(self, project_root: Path) -> str: """Summarize codebase structure as text""" structure = [] for f in project_root.rglob("*.py"): structure.append(f"- {f.relative_to(project_root)}") return "\n".join(structure)
def plan(self, requirement: str, project_root: Path) -> dict: codebase = self.analyze_codebase(project_root)
response = self.client.messages.create( model=os.getenv("ANTHROPIC_MODEL", "claude-sonnet-4-5"), max_tokens=4096, system=SYSTEM_PROMPT, messages=[{ "role": "user", "content": f"Requirement: {requirement}\n\nCodebase:\n{codebase}" }] )
return json.loads(response.content[0].text)
def generate_spec_md(self, plan: dict, output_path: Path): """Convert JSON plan to a Markdown specification document""" with open(output_path, 'w') as f: f.write("# Project Specification\n\n") for task in plan['tasks']: f.write(f"## {task['id']}: {task['description']}\n") f.write(f"- Target files: {', '.join(task['target_files'])}\n") f.write("- Acceptance criteria:\n") for criterion in task['acceptance_criteria']: f.write(f" - [ ] {criterion}\n") f.write("\n")After the basic implementation, two core capabilities must be added: automatic validation of specification quality and construction of the task dependency DAG.
def validate_spec(self, plan: dict) -> tuple[bool, list[str]]: """Automatically validate the quality of a generated specification""" issues = []
for task in plan['tasks']: # Validate completeness of acceptance criteria if not task.get('acceptance_criteria'): issues.append(f"{task['id']}: acceptance_criteria missing") elif len(task['acceptance_criteria']) < 2: issues.append(f"{task['id']}: at least 2 acceptance_criteria required")
# Check for presence of assumptions if 'assumptions' not in task: issues.append(f"{task['id']}: assumptions field missing")
# Check whether out-of-scope is stated (at spec level, not task level) if 'out_of_scope' not in plan: issues.append("spec-level out_of_scope not defined") break # report only once
return len(issues) == 0, issues
def create_task_dag(self, plan: dict, output_dir: Path): """Generate TASK files and build the dependency DAG (implements Week 7 tier concept)""" task_map = {t['id']: t for t in plan['tasks']}
# Tier calculation: tasks with no dependencies = tier 1, # tasks with dependencies = max(dep_tier) + 1 tiers = {} def get_tier(task_id: str) -> int: if task_id in tiers: return tiers[task_id] task = task_map[task_id] deps = task.get('dependencies', []) if not deps: tiers[task_id] = 1 else: tiers[task_id] = max(get_tier(d) for d in deps) + 1 return tiers[task_id]
for task in plan['tasks']: get_tier(task['id'])
# Generate TASK files output_dir.mkdir(parents=True, exist_ok=True) for task in plan['tasks']: task_file = output_dir / f"{task['id']}.md" with open(task_file, 'w') as f: f.write(f"---\nid: {task['id']}\ntier: {tiers[task['id']]}\n") f.write(f"dependencies: {task.get('dependencies', [])}\n---\n\n") f.write(f"## {task['description']}\n\n") f.write("### Acceptance Criteria\n") for ac in task['acceptance_criteria']: f.write(f"- [ ] {ac}\n")The existing analyze_codebase() returns only a file list. The context a coder agent actually needs is far richer.
Let’s look at the 3-stage analysis pattern used by the /architect skill in sdlc-toolkit.
def analyze_structure(self, project_root: Path) -> dict: """Build file/directory/module tree""" structure = {"files": [], "directories": set(), "modules": []}
for f in project_root.rglob("*.py"): rel = f.relative_to(project_root) structure["files"].append(str(rel)) structure["directories"].add(str(rel.parent))
# Directories containing __init__.py = Python packages if (f.parent / "__init__.py").exists(): module = str(rel.parent).replace("/", ".") if module not in structure["modules"]: structure["modules"].append(module)
structure["directories"] = sorted(structure["directories"]) return structureimport ast
def analyze_semantics(self, project_root: Path) -> dict: """Extract function/class signatures and the import graph""" symbols = {}
for f in project_root.rglob("*.py"): rel = str(f.relative_to(project_root)) try: tree = ast.parse(f.read_text()) except SyntaxError: continue
file_symbols = {"classes": [], "functions": [], "imports": []}
for node in ast.walk(tree): if isinstance(node, ast.ClassDef): methods = [n.name for n in ast.walk(node) if isinstance(n, ast.FunctionDef)] file_symbols["classes"].append( {"name": node.name, "methods": methods} ) elif isinstance(node, ast.FunctionDef) and node.col_offset == 0: args = [a.arg for a in node.args.args] file_symbols["functions"].append( {"name": node.name, "args": args, "line": node.lineno} ) elif isinstance(node, (ast.Import, ast.ImportFrom)): file_symbols["imports"].append(ast.dump(node))
symbols[rel] = file_symbols
return symbolsdef analyze_context(self, project_root: Path) -> dict: """Extract existing conventions and architectural patterns""" context = { "naming_convention": None, "test_framework": None, "patterns": [], "entry_points": [] }
# Detect naming convention (snake_case vs camelCase) py_files = list(project_root.rglob("*.py")) snake_count = sum(1 for f in py_files if "_" in f.stem) context["naming_convention"] = ( "snake_case" if snake_count > len(py_files) / 2 else "camelCase" )
# Detect test framework if (project_root / "pytest.ini").exists() or \ any(project_root.rglob("conftest.py")): context["test_framework"] = "pytest" elif any(project_root.rglob("test_*.py")): context["test_framework"] = "unittest"
# Detect entry points for candidate in ["main.py", "app.py", "run.py", "__main__.py"]: if (project_root / candidate).exists(): context["entry_points"].append(candidate)
return contextAn analyze_codebase_full() method integrating all three stages summarizes each stage’s output to produce a compact representation for injection into the LLM. A structural summary — not the full source — increases context efficiency by more than 10×.
This is the /validate pattern for automatically verifying whether a generated spec.md is sufficient for a coder agent to execute. It implements the “3 retries → human escalation” flow designed in Week 7.
VALIDATION_CHECKLIST = [ ("frontmatter_valid", lambda s: bool(s.get("title") and s.get("description"))), ("has_what_and_why", lambda s: "what" in s.get("description","").lower() or "why" in s.get("description","").lower()), ("ac_specific", lambda s: all( len(ac) > 20 and not ac.endswith("...") for ac in s.get("acceptance_criteria", []) )), ("no_impl_details", lambda s: not any( kw in str(s.get("acceptance_criteria","")) for kw in ["import", "def ", "class ", "```"] )),]
class SpecValidator: def __init__(self, client: anthropic.Anthropic, max_retries: int = 3): self.client = client self.max_retries = max_retries
def validate(self, spec: dict) -> tuple[bool, list[str]]: failures = [] for name, check_fn in VALIDATION_CHECKLIST: try: if not check_fn(spec): failures.append(name) except Exception as e: failures.append(f"{name}: error({e})") return len(failures) == 0, failures
def validate_with_retry(self, spec: dict, requirement: str) -> dict: """Auto-regenerate and validate. Human escalation after 3 failures.""" for attempt in range(self.max_retries): passed, failures = self.validate(spec) if passed: return spec
print(f"[attempt {attempt+1}/{self.max_retries}] validation failed: {failures}")
# Re-request generation with failures as feedback spec = self._regenerate_spec(spec, requirement, failures)
# All 3 attempts failed → human escalation raise ValueError( f"Spec auto-validation failed (3 attempts). Manual review required.\n" f"Last failures: {failures}" )
def _regenerate_spec(self, spec: dict, requirement: str, failures: list[str]) -> dict: feedback = "\n".join(f"- {f}" for f in failures) response = self.client.messages.create( model=os.getenv("ANTHROPIC_MODEL", "claude-sonnet-4-5"), max_tokens=2048, messages=[{ "role": "user", "content": ( f"The following validation items failed for this specification:\n{feedback}\n\n" f"Original requirement: {requirement}\n\n" f"Revise the specification to pass validation. Output JSON." ) }] ) return json.loads(response.content[0].text)Q1. Detecting Ambiguity
Where is the boundary between “ambiguous” and “specific” for acceptance criteria in a planner-generated spec.md? Can ambiguity be detected automatically? Propose a mechanical definition of ambiguity.
Q2. Benefits of Separation
sdlc-toolkit intentionally separates requirements specification (/spec) from architecture design (/architect). What are the benefits of this separation? What problems arise if the two stages are merged into one?
Q3. Context Rot Trade-offs
Analyze the trade-offs between injecting the entire codebase into context versus injecting a summary, from the perspective of Context Rot as covered in Week 5. At what point should you switch to a summarization strategy?
Q4. MetaGPT vs sdlc-toolkit
Compare MetaGPT’s PM role with the /spec skill in sdlc-toolkit. Which is more deterministic? What design principles increase determinism?
Implement the Planner Agent — Extend the code above as a starting point
Enhance Codebase Analysis — Go beyond a simple file list to analyze functions and classes
spec.md Generation Pipeline — Apply it to a real project (the calculator from Lab 04)
Validate the Generated spec.md — Verify that a Coder Agent can actually execute the tasks
/spec (what) → /architect (how). Applying the Separation of Concerns principle to agent design.dependencies array in TASK files and tier calculation implement the parallelization design from Week 7. Tier 1 tasks can run concurrently.MetaGPT (ICLR 2024)
SOP-based multi-agent framework. Empirical demonstration of the PM → Architect → Engineer sequence and the correlation between document quality and code quality. The key reference for planner design.
SWE-agent (Princeton, 2024)
Automated codebase exploration and planning. Presents methods for LLMs to effectively navigate file systems through Agent-Computer Interface (ACI) design.
Anthropic Claude Code Agent Tool
Sub-agent spawning, model routing, and worktree isolation. Real-world patterns for implementing planner-coder separation in production.
sdlc-toolkit /spec + /architect
Production implementation of the requirements-architecture separation pattern. A comprehensive reference including LESSON-file-based learning from past failures, the validate gate, and TASK DAG generation.
Submission deadline: 2026-04-28 23:59
Requirements:
PlannerAgent implementation (planner_agent.py)spec.md generated by applying it to a real project