Concepts
Compare traditional and agentic SDLCs across four phases and identify exactly where a single-agent approach stalls.
Concepts
Compare traditional and agentic SDLCs across four phases and identify exactly where a single-agent approach stalls.
Design
Distribute Lead, Planner, Worker, Reviewer, and Operator roles, compare linear / hub-spoke / mesh topologies, and decide what fits the team.
Implementation
Author artifact-based handoff contracts (spec.md, plan.md, worker_report.md) and run a single end-to-end cycle.
Operations
Integrate quality gates across the multi-agent pipeline so telemetry can answer “which role failed.”
Through Week 6 we covered single agents — regulating behavior with CLAUDE.md (Week 6), preventing Context Rot (Week 5), and securing iterative quality with the Ralph Loop (Week 4). The question now is: where does the single-agent approach hit its limits?
A joint study by DeepMind and MIT, “Towards a Science of Scaling Agent Systems” (December 2025), provides decisive data:
An unstructured collection of agents (“bag of agents”) amplifies errors by 17.2×. In contrast, centralized coordination reduces error amplification to 4.4×.
The same pattern appears on SWE-Bench Pro — the same model (Claude Opus 4.5) shows a score range from 45.9% to 55.4% depending on the scaffolding. It is the system design wrapping the model, not the model itself, that determines performance.
This week covers the architecture and design principles of multi-agent SDLC. Code implementation happens in Week 8 (Planner Agent) and Week 9 (QA Agent).
| Traditional Role | Agentic Equivalent | Tool Access (MCP) | Output Artifact |
|---|---|---|---|
| Product Manager | Planner Agent | Web search, document reading | requirement.md |
| Software Architect | Architect Agent | Repository mapping, dependency analysis | architecture.md, TASK files |
| Developer | Coder Agent (Ralph Loop) | File editing, compiler, tests | Code changes, PR |
| QA Engineer | QA Agent | pytest, diff viewer, linter | Review results, severity report |
| DevOps | Deploy Agent | Docker, CI/CD, monitoring | Deploy results, smoke tests |
| Release Manager | Completion Agent | Git merge, tagging | ship-summary, release notes |
| Knowledge Manager | Retrospective Agent | File read/write | LESSON files, assumption verification |
This role separation is a pattern validated in academia as well:
What does the diagram above look like when implemented as a real production system? The diagram below visualizes the full pipeline of sdlc-toolkit — knowledge feedback loops, validation gates, and lesson capture.
Spec-based development lifecycle with knowledge feedback loops
/spec
References lessonsWrites a requirements spec based on the feature request.
/validate
Validates spec quality before architectural design begins.
/architect
References lessonsDesigns the architecture and breaks it into detailed tasks (TASKs).
/validate
Validates the quality of the architecture and tasks.
Implement
Codes tasks in dependency order; independent tasks are processed in parallel.
/reflect
References lessonsConducts a self-review after implementation is complete.
/review
Performs a multi-agent code review to ensure quality and correctness.
Create & Merge PR
Opens a pull request, passes final review, then merges.
/wrapup
Updates deployment and artifacts, then captures lessons learned and assumptions from development.
Captured via /wrapup at the end of every feature development cycle. Each lesson records what happened, why it matters, and when it applies.
Creates a continuous improvement cycle by reading lessons before performing work at three key stages:
← /spec Prevents repeating approaches that previously failed← /architect Reuses proven patterns and avoids past mistakes← /reflect Confirms that lessons relevant to the current task are appliedQuality checks run between major stages. Up to 3 automatic fix retries are performed before halting the pipeline.
Tracked continuously alongside lessons. /architect references this content when making architectural design decisions.
Automatically runs the entire pipeline above in sequence, including validation gates and automatic fix retries.
Lightweight path — skips the spec and architecture stages for fast bug fixes.
The /proceed pipeline of sdlc-toolkit implements a 9-stage gated execution.
| Phase | Name | Agent | Gate |
|---|---|---|---|
| 0 | Create Worktree | Orchestrator | Branch isolation check |
| 1 | Validate Spec | Validator | Requirements completeness |
| 2 | Architecture + Task Decomposition | Architect | Dependency DAG validity |
| 3 | Validate Architecture | Validator | Pattern compatibility, task coverage |
| 4 | Implement (parallel) | Coder × N | Each task AC satisfied |
| 5 | Verify (Reflect + Review) | QA | PASS/FAIL verdict |
| 6 | Create PR | Orchestrator | CI passing |
| 7 | PR Cleanup + CI | Orchestrator | Lint/test passing |
| 8 | Wrapup (merge, deploy, knowledge capture) | Wrapup | LESSON file created |
Core principle: Each phase only starts after explicitly confirming completion of the previous phase. No skipping allowed.
{ "req": "REQ-023", "branch": "feat/REQ-023-user-auth", "startedAt": "2026-04-14T10:00:00Z", "completed": false, "currentPhase": 4, "completedPhases": [0, 1, 2, 3], "phaseHistory": [ { "phase": 0, "name": "Create Worktree", "completedAt": "2026-04-14T10:01:00Z" }, { "phase": 1, "name": "Validate Spec", "completedAt": "2026-04-14T10:03:00Z" }, { "phase": 2, "name": "Architect", "completedAt": "2026-04-14T10:08:00Z" }, { "phase": 3, "name": "Validate Architecture", "completedAt": "2026-04-14T10:10:00Z" } ]}pipeline-state.json tracks the progress of the entire pipeline. On interruption, resumes from currentPhase. This file itself is the synchronization mechanism for inter-agent handoffs.
When a validation gate fails:
Failure detected → Automatic fix attempt (attempt 1) ↓ failsRe-validate → Automatic fix attempt (attempt 2) ↓ failsRe-validate → Automatic fix attempt (attempt 3) ↓ fails⚠️ Human escalation — pipeline pausedRationale for the 3-attempt cap: If an agent fails to fix the same error three times, it does not understand the problem. Further attempts only consume tokens without improving quality. PwC research showed that combining a validation loop with a judge agent improved accuracy 7× from 10% to 70% — this is the gate mechanism at work.
How are agents in a multi-agent system coordinated? There are three fundamental topologies:
| Topology | Structure | Examples | Error Amplification |
|---|---|---|---|
| Centralized | Single orchestrator controls sequencing | sdlc-toolkit /proceed, Claude Code Agent Tool | 4.4× (DeepMind) |
| Hierarchical | Orchestrator of orchestrators | sdlc-toolkit /sprint (spawns 5 /proceed in parallel) | 4.4× × management overhead |
| Distributed (peer-to-peer) | Agents communicate directly with each other | ”bag of agents” | 17.2× (DeepMind) |
How agents access external tools and how agents communicate with each other are different problems:
| Protocol | Purpose | Scale | Core Structure |
|---|---|---|---|
| MCP (Anthropic, 2024) | Agent → tool access | 97M+ monthly SDK downloads, 6,400+ servers | Server/Client, Tool/Resource |
| A2A (Google, 2025) | Agent → agent delegation | 150+ partner organizations | Task, Artifact, Agent Card |
| Artifact Handoff (this week) | Agent → agent (file-based) | Project local | Markdown/JSON files |
MCP was donated to the Linux Foundation (December 2025) and has established itself as an industry standard. A2A is an agent-to-agent delegation protocol proposed by Google, using Task/Artifact/Agent Card JSON structures to declare agent capabilities and delegate work.
The artifact handoff covered this week is the simplest yet most deterministic approach — the filesystem serves as the communication channel, making everything debuggable, auditable, and reproducible.
Before building the full pipeline above from scratch, let’s first understand the lightweight multi-agent tools built into Claude Code. These tools, revealed by Boris Cherny in February 2026, replace each pipeline stage with a single CLI flag.
# Press Shift+Tab to enter Plan Mode# Draft plan → user confirmation → auto-executePressing Shift+Tab makes Claude Code draft a plan before writing any code. Once the plan is confirmed, it automatically proceeds to implementation. Boris: “Claude 1-shots the implementation when the plan is right.”
This performs what the Planner Agent above does — requirements parsing, spec.md generation, priority assignment — in an interactive conversational flow. When you build PlannerAgent from scratch in Week 8, you’ll understand the internal structure of this process.
Add Markdown files to the .claude/agents/ directory to define specialized agents:
---name: code-simplifierdescription: Code simplification specialist agenttools: [Read, Edit, Grep, Glob]---
Review changed code to:1. Leverage existing reusable functions2. Remove unnecessary complexity3. Apply consistent patterns---name: verify-appdescription: Application verification agenttools: [Read, Bash, Grep]---
Verify changes by:1. Confirming all tests pass2. Confirming build succeeds3. Confirming no lint errors# Run with a specific agentclaude --agent code-simplifier "Refactor this module"
# Set default agent in settings.json# { "defaultAgent": "code-simplifier" }This is the role assignment design above — Coder Agent, QA Agent — implemented declaratively in a single .md file. The same principle of MCP-governed tool access applies via the tools field. The 11 skills of sdlc-toolkit (/spec, /architect, /validate, /review, etc.) are production examples that use this Skills system.
/simplify — Built-in QA Agent# Auto-review after code changesclaude /simplifyParallel agents review changed code simultaneously across three dimensions: reuse, quality, and efficiency. Boris: “It catches the structural issues a senior engineer would flag in the first five minutes of code review.”
/batch — Large-Scale Parallel Execution Engine# Interactive planning → parallel executionclaude /batch "Migrate logging in src/ to the new structured logger"/batch operates in three stages:
Boris’s team case: 6 parallel agents migrating logging across 14 files. Total: 11 minutes. 5 of 6 PRs merged without changes. The remaining one required human judgment on a conditional logging edge case.
This is the same multi-agent pipeline principle above — Planner → Coder × N → QA — packaged at product level.
# Install a skill (example — verify actual URL from the skill distributor)mkdir -p ~/.claude/skills/boriscurl -L -o ~/.claude/skills/boris/SKILL.md \ https://example.com/skills/boris/SKILL.md
# Or write your own SKILL.md and place it directly# Load skill in a sessionclaude /skills borisThis extends the instruction tuning from Week 6 (adding constraints to PROMPT.md) into reusable packages. Boris’s own 42 tips are packaged as a single skill, loadable in any project.
| Aspect | Full Pipeline (Weeks 7-9) | Native Tools (Boris) |
|---|---|---|
| Setup cost | High — JSON schemas, agent code implementation | Low — .md files, CLI flags |
| Flexibility | Unlimited — custom handoff logic, feedback loops | Limited — within preset capabilities |
| Inter-agent comms | Artifact-based (JSON schema contracts) | None — each agent runs independently |
| Verification | QA agent runs integration tests + code review | /simplify catches structural issues only |
| Error recovery | Gated retries (3×) + human escalation | None — manual restart on failure |
| Best for | Complex multi-stage workflows, custom quality criteria | Large-scale parallel processing of repetitive tasks |
The key to inter-agent communication is structured artifacts. Not natural-language messages, but schema-defined files that move between agents.
---id: REQ-023title: "Add user authentication feature"status: draft # draft → approved → in-progress → completedeployable: truecreated: 2026-04-14updated: 2026-04-14---
## DescriptionImplement a JWT-based user authentication system. Includes login, sign-up, and token refresh.
## Acceptance Criteria- [ ] POST /auth/login endpoint works- [ ] JWT token issuance and verification- [ ] Password bcrypt hashing- [ ] Automatic token refresh on expiry
## Assumptions- Using PostgreSQL (leverages existing DB connection)- Token validity: access 15 min, refresh 7 days
## Out of Scope- OAuth2 social login (separate REQ)- 2FA (separate REQ)Generated by Planner Agent → Validator verifies → Architect Agent consumes
---id: TASK-001title: "JWT token generation module"status: draftparent: REQ-023created: 2026-04-14updated: 2026-04-14dependencies: [] # Tier 0: no dependencies → can run in parallel---
## Files to Create/Modify- `auth/jwt.py` — token creation/verification utility- `tests/test_jwt.py` — unit tests
## Acceptance Criteria- [ ] create_token(payload, secret) → JWT string- [ ] verify_token(token, secret) → payload or exception- [ ] TokenExpiredError raised when verifying an expired token---id: TASK-003title: "Login API endpoint"status: draftparent: REQ-023dependencies: ["TASK-001", "TASK-002"] # Tier 1: waits for TASK-001, 002---Generated by Architect Agent → the dependencies array determines execution order
---id: LESSON-042title: "Rotating JWT secret invalidates existing tokens"domain: "API" # broad areacomponent: "API/auth" # narrow areatags: [security, jwt]req: REQ-023created: 2026-04-14---
## What HappenedRotating the secret key invalidated all previously issued tokens.
## LessonWhen rotating secrets, implement multi-key verification logic so tokensissued with the previous key remain valid during a grace period.
## Applies WhenJWT secret management, key rotation, authentication system changesGenerated by Wrapup Agent → future /spec and /architect grep by domain:API + component:API/auth to automatically reference this lesson
The dependencies array in TASK files determines execution order:
Tier 0 (no dependencies): TASK-001, TASK-002 → run concurrently ↓ wait for completionTier 1 (depends on Tier 0): TASK-003, TASK-004 → run concurrently ↓ wait for completionTier 2 (depends on Tier 1): TASK-005 → run aloneThis tier-based parallelization operates in Phase 4 (Implementation) of the pipeline. Independent tasks run in parallel in separate worktrees; tasks with dependencies wait for their predecessors to complete.
A single reviewer has blind spots — a reviewer strong in security misses performance issues; focusing on architecture means overlooking edge cases. Production systems run 3 specialist reviewers in parallel:
| Reviewer | Review Area | Severity Criteria |
|---|---|---|
| Correctness Reviewer | Logic errors, race conditions, security vulnerabilities, edge cases | Critical: data loss / security violation |
| Quality Reviewer | Naming, pattern consistency, duplicate code, hardcoded config | Major: maintainability degradation |
| Architecture Reviewer | Layer separation, separation of concerns, test coverage, API compliance | Major: structural debt |
Severity scale: Critical > Major > Minor > Nit. Any Critical finding means FAIL — feedback is automatically sent back to the coder.
On top of this 3-parallel review pattern sits a 2-stage structure:
/reflect (self-review): The coder agent reviews its own code first, catching obvious mistakes to reduce the burden on independent review./review (independent review): Three reviewers in parallel, with no knowledge of the coder’s reasoning.Boris’s /simplify is a lightweight version of this pattern — same parallel review principle, but catching only structural issues without domain specialization. This design is implemented in Python in Week 9.
Multi-agent systems have unique failure modes absent in single-agent setups:
| Failure Mode | Description | Mitigation Strategy |
|---|---|---|
| Context Rot propagation | Context lost at each handoff (Week 5 reference) | Artifact-based handoffs — structured files preserve context |
| 17× error trap | Silent error compounding in unstructured agent networks | Centralized coordination + gated validation |
| Hallucination propagation | One agent’s hallucination becomes the next agent’s ground truth | Independent validation gate at each phase |
| Infinite refinement loop | QA→Coder→QA cycles without convergence | Retry cap (3×) + human escalation |
| State desynchronization | File conflicts between parallel agents | Git worktree isolation — each agent has an independent workspace |
| Cost escalation | Uncontrolled agent spawning | Concurrency cap (5 agents) + model tier routing (exploration: haiku, implementation: sonnet, review: opus) |
Explain the mechanism by which “bag of agents” amplifies errors 17.2× in the DeepMind study. Why does structured coordination reduce this to 4.4×?
On SWE-Bench Pro, the same model shows a score difference of 45.9%–55.4% depending on the scaffolding. Use this data to argue the claim that “the harness matters as much as the model.”
What are the trade-offs between passing natural-language messages between agents versus structured artifact (JSON/Markdown) handoffs? In which situations does each approach excel?
In the /proceed pipeline, what is the rationale for escalating to humans after a maximum of 3 retries at each gate? What problems arise if the retry count is raised to 10?
When applying the single-agent instruction tuning from Week 6 (CLAUDE.md) to a multi-agent system, how would you separate common rules from role-specific rules? Reference sdlc-toolkit’s conventions.md (common) and individual SKILL.md (role-specific) structure.
Role Assignment Design
Given a project specification, design the roles, responsibilities, and MCP tool access permissions for 5 agents (Planner, Architect, Coder, QA, Wrapup).
Define Artifact Schemas
Define the schema for every artifact passed between agents. Minimum 3 types: requirement spec, task file (with dependency array), pipeline state.
Dependency DAG Design
Decompose a given requirement into TASK files, draw the dependency graph, and identify tiers that can run in parallel.
Validation Gate Design
Define the verification checklist for each phase transition. Customize the /validate checklist above to fit your project.
Error Recovery Scenarios
Document recovery strategies for 3 failure scenarios (test failure, gate exceeding 3 retries, merge conflict).
Submission deadline: 2026-04-21 23:59
Requirements:
requirement.md, TASK-xxx.md, pipeline-state.json) carry the handoffs.domain/component tags in LESSON files automatically inject past lessons into future specs and architectures.