Week 7: Multi-Agent SDLC Design

Phase 3Week 7AdvancedLecture: 2026-04-14

Theory

Learning Objectives

Concepts

Compare traditional and agentic SDLCs across four phases and identify exactly where a single-agent approach stalls.

Design

Distribute Lead, Planner, Worker, Reviewer, and Operator roles, compare linear / hub-spoke / mesh topologies, and decide what fits the team.

Implementation

Author artifact-based handoff contracts (spec.md, plan.md, worker_report.md) and run a single end-to-end cycle.

Operations

Integrate quality gates across the multi-agent pipeline so telemetry can answer “which role failed.”

Why Multi-Agent SDLC in Week 7

Through Week 6 we covered single agents — regulating behavior with CLAUDE.md (Week 6), preventing Context Rot (Week 5), and securing iterative quality with the Ralph Loop (Week 4). The question now is: where does the single-agent approach hit its limits?

A joint study by DeepMind and MIT, “Towards a Science of Scaling Agent Systems” (December 2025), provides decisive data:

An unstructured collection of agents (“bag of agents”) amplifies errors by 17.2×. In contrast, centralized coordination reduces error amplification to 4.4×.

The same pattern appears on SWE-Bench Pro — the same model (Claude Opus 4.5) shows a score range from 45.9% to 55.4% depending on the scaffolding. It is the system design wrapping the model, not the model itself, that determines performance.

This week covers the architecture and design principles of multi-agent SDLC. Code implementation happens in Week 8 (Planner Agent) and Week 9 (QA Agent).

Traditional SDLC → Agentic SDLC

Traditional Role	Agentic Equivalent	Tool Access (MCP)	Output Artifact
Product Manager	Planner Agent	Web search, document reading	`requirement.md`
Software Architect	Architect Agent	Repository mapping, dependency analysis	`architecture.md`, TASK files
Developer	Coder Agent (Ralph Loop)	File editing, compiler, tests	Code changes, PR
QA Engineer	QA Agent	pytest, diff viewer, linter	Review results, severity report
DevOps	Deploy Agent	Docker, CI/CD, monitoring	Deploy results, smoke tests
Release Manager	Completion Agent	Git merge, tagging	ship-summary, release notes
Knowledge Manager	Retrospective Agent	File read/write	LESSON files, assumption verification

This role separation is a pattern validated in academia as well:

MetaGPT (ICLR 2024): Connects PM, Architect, Project Manager, Engineer, and QA through SOP (Standard Operating Procedure)-based structured documents. Structured document handoffs between roles — not natural-language chat — are the key.
ChatDev (ACL 2024, v2.0 January 2026): Demonstrated via chat-based phase execution that role specialization consistently outperforms monolithic prompting.

Multi-Agent Pipeline Architecture

MULTI-AGENT PIPELINE

Human (HIC)Requirements input

↓

Planner Agent

Parse requirements
Generate spec.md
Determine priorities

↓Pass spec.md

Initializer Agent

Analyze codebase
Decompose subtasks
Generate init.sh

↓Pass task_queue.json

Coder Agent × N (Ralph Loop)

Execute tasks in parallel
Must pass local tests

↓Create PR

QA Agent

Independent code review
Run integration tests
Regression verification

↓Approve/Reject

Deploy Agent

Staging deployment
E2E tests
Human final approval (Hard Interrupt)

↓

Production Deployment

Gated Pipeline — A Production Example

What does the diagram above look like when implemented as a real production system? The diagram below visualizes the full pipeline of sdlc-toolkit — knowledge feedback loops, validation gates, and lesson capture.

/spec

References lessons

Writes a requirements spec based on the feature request.

/validate

Validates spec quality before architectural design begins.

/architect

References lessons

Designs the architecture and breaks it into detailed tasks (TASKs).

/validate

Validates the quality of the architecture and tasks.

Implement

Codes tasks in dependency order; independent tasks are processed in parallel.

/reflect

References lessons

Conducts a self-review after implementation is complete.

/review

Performs a multi-agent code review to ensure quality and correctness.

Create & Merge PR

Opens a pull request, passes final review, then merges.

/wrapup

Updates deployment and artifacts, then captures lessons learned and assumptions from development.

Lessons Learned

.sdlc/knowledge/lessons/

Captured via /wrapup at the end of every feature development cycle. Each lesson records what happened, why it matters, and when it applies.

Feedback Loop

Creates a continuous improvement cycle by reading lessons before performing work at three key stages:

← /spec Prevents repeating approaches that previously failed
← /architect Reuses proven patterns and avoids past mistakes
← /reflect Confirms that lessons relevant to the current task are applied

Validation Gates

Quality checks run between major stages. Up to 3 automatic fix retries are performed before halting the pipeline.

Assumptions

.sdlc/knowledge/assumptions/

Tracked continuously alongside lessons. /architect references this content when making architectural design decisions.

/proceed REQ-xxx

Automatically runs the entire pipeline above in sequence, including validation gates and automatic fix retries.

/bugfix

Lightweight path — skips the spec and architecture stages for fast bug fixes.

The /proceed pipeline of sdlc-toolkit implements a 9-stage gated execution.

Phase	Name	Agent	Gate
0	Create Worktree	Orchestrator	Branch isolation check
1	Validate Spec	Validator	Requirements completeness
2	Architecture + Task Decomposition	Architect	Dependency DAG validity
3	Validate Architecture	Validator	Pattern compatibility, task coverage
4	Implement (parallel)	Coder × N	Each task AC satisfied
5	Verify (Reflect + Review)	QA	PASS/FAIL verdict
6	Create PR	Orchestrator	CI passing
7	PR Cleanup + CI	Orchestrator	Lint/test passing
8	Wrapup (merge, deploy, knowledge capture)	Wrapup	LESSON file created

Core principle: Each phase only starts after explicitly confirming completion of the previous phase. No skipping allowed.

{
  "req": "REQ-023",
  "branch": "feat/REQ-023-user-auth",
  "startedAt": "2026-04-14T10:00:00Z",
  "completed": false,
  "currentPhase": 4,
  "completedPhases": [0, 1, 2, 3],
  "phaseHistory": [
    { "phase": 0, "name": "Create Worktree", "completedAt": "2026-04-14T10:01:00Z" },
    { "phase": 1, "name": "Validate Spec", "completedAt": "2026-04-14T10:03:00Z" },
    { "phase": 2, "name": "Architect", "completedAt": "2026-04-14T10:08:00Z" },
    { "phase": 3, "name": "Validate Architecture", "completedAt": "2026-04-14T10:10:00Z" }
  ]
}

pipeline-state.json tracks the progress of the entire pipeline. On interruption, resumes from currentPhase. This file itself is the synchronization mechanism for inter-agent handoffs.

When a validation gate fails:

Failure detected → Automatic fix attempt (attempt 1)
     ↓ fails
Re-validate → Automatic fix attempt (attempt 2)
     ↓ fails
Re-validate → Automatic fix attempt (attempt 3)
     ↓ fails
⚠️ Human escalation — pipeline paused

Rationale for the 3-attempt cap: If an agent fails to fix the same error three times, it does not understand the problem. Further attempts only consume tokens without improving quality. PwC research showed that combining a validation loop with a judge agent improved accuracy 7× from 10% to 70% — this is the gate mechanism at work.

Agent Coordination Topologies

How are agents in a multi-agent system coordinated? There are three fundamental topologies:

Topology	Structure	Examples	Error Amplification
Centralized	Single orchestrator controls sequencing	sdlc-toolkit `/proceed`, Claude Code Agent Tool	4.4× (DeepMind)
Hierarchical	Orchestrator of orchestrators	sdlc-toolkit `/sprint` (spawns 5 `/proceed` in parallel)	4.4× × management overhead
Distributed (peer-to-peer)	Agents communicate directly with each other	”bag of agents”	17.2× (DeepMind)

Inter-Agent Communication Protocols

How agents access external tools and how agents communicate with each other are different problems:

Protocol	Purpose	Scale	Core Structure
MCP (Anthropic, 2024)	Agent → tool access	97M+ monthly SDK downloads, 6,400+ servers	Server/Client, Tool/Resource
A2A (Google, 2025)	Agent → agent delegation	150+ partner organizations	Task, Artifact, Agent Card
Artifact Handoff (this week)	Agent → agent (file-based)	Project local	Markdown/JSON files

MCP was donated to the Linux Foundation (December 2025) and has established itself as an industry standard. A2A is an agent-to-agent delegation protocol proposed by Google, using Task/Artifact/Agent Card JSON structures to declare agent capabilities and delegate work.

The artifact handoff covered this week is the simplest yet most deterministic approach — the filesystem serves as the communication channel, making everything debuggable, auditable, and reproducible.

Claude Code Native Multi-Agent — A Lightweight Alternative

Before building the full pipeline above from scratch, let’s first understand the lightweight multi-agent tools built into Claude Code. These tools, revealed by Boris Cherny in February 2026, replace each pipeline stage with a single CLI flag.

Plan Mode — Built-in Planner Agent

# Press Shift+Tab to enter Plan Mode
# Draft plan → user confirmation → auto-execute

Pressing Shift+Tab makes Claude Code draft a plan before writing any code. Once the plan is confirmed, it automatically proceeds to implementation. Boris: “Claude 1-shots the implementation when the plan is right.”

This performs what the Planner Agent above does — requirements parsing, spec.md generation, priority assignment — in an interactive conversational flow. When you build PlannerAgent from scratch in Week 8, you’ll understand the internal structure of this process.

Custom Agents — Declarative Role Specialization

Add Markdown files to the .claude/agents/ directory to define specialized agents:

---
name: code-simplifier
description: Code simplification specialist agent
tools: [Read, Edit, Grep, Glob]
---

Review changed code to:
1. Leverage existing reusable functions
2. Remove unnecessary complexity
3. Apply consistent patterns

---
name: verify-app
description: Application verification agent
tools: [Read, Bash, Grep]
---

Verify changes by:
1. Confirming all tests pass
2. Confirming build succeeds
3. Confirming no lint errors

# Run with a specific agent
claude --agent code-simplifier "Refactor this module"

# Set default agent in settings.json
# { "defaultAgent": "code-simplifier" }

This is the role assignment design above — Coder Agent, QA Agent — implemented declaratively in a single .md file. The same principle of MCP-governed tool access applies via the tools field. The 11 skills of sdlc-toolkit (/spec, /architect, /validate, /review, etc.) are production examples that use this Skills system.

`/simplify` — Built-in QA Agent

# Auto-review after code changes
claude /simplify

Parallel agents review changed code simultaneously across three dimensions: reuse, quality, and efficiency. Boris: “It catches the structural issues a senior engineer would flag in the first five minutes of code review.”

`/batch` — Large-Scale Parallel Execution Engine

# Interactive planning → parallel execution
claude /batch "Migrate logging in src/ to the new structured logger"

/batch operates in three stages:

Interactive planning: Decomposes the task through conversation with the user
Parallel execution: Runs each subtask in an independent worktree in parallel
PR creation: Each agent opens an individual PR after its tests pass

Boris’s team case: 6 parallel agents migrating logging across 14 files. Total: 11 minutes. 5 of 6 PRs merged without changes. The remaining one required human judgment on a conditional logging edge case.

This is the same multi-agent pipeline principle above — Planner → Coder × N → QA — packaged at product level.

Skills System — Packaged Instruction Tuning

# Install a skill (example — verify actual URL from the skill distributor)
mkdir -p ~/.claude/skills/boris
curl -L -o ~/.claude/skills/boris/SKILL.md \
  https://example.com/skills/boris/SKILL.md

# Or write your own SKILL.md and place it directly
# Load skill in a session
claude /skills boris

This extends the instruction tuning from Week 6 (adding constraints to PROMPT.md) into reusable packages. Boris’s own 42 tips are packaged as a single skill, loadable in any project.

Full Pipeline vs Native Tools — When to Use Which

Aspect	Full Pipeline (Weeks 7-9)	Native Tools (Boris)
Setup cost	High — JSON schemas, agent code implementation	Low — .md files, CLI flags
Flexibility	Unlimited — custom handoff logic, feedback loops	Limited — within preset capabilities
Inter-agent comms	Artifact-based (JSON schema contracts)	None — each agent runs independently
Verification	QA agent runs integration tests + code review	`/simplify` catches structural issues only
Error recovery	Gated retries (3×) + human escalation	None — manual restart on failure
Best for	Complex multi-stage workflows, custom quality criteria	Large-scale parallel processing of repetitive tasks

Artifact-Based Handoff Design

The key to inter-agent communication is structured artifacts. Not natural-language messages, but schema-defined files that move between agents.

---
id: REQ-023
title: "Add user authentication feature"
status: draft          # draft → approved → in-progress → complete
deployable: true
created: 2026-04-14
updated: 2026-04-14
---

## Description
Implement a JWT-based user authentication system. Includes login, sign-up, and token refresh.

## Acceptance Criteria
- [ ] POST /auth/login endpoint works
- [ ] JWT token issuance and verification
- [ ] Password bcrypt hashing
- [ ] Automatic token refresh on expiry

## Assumptions
- Using PostgreSQL (leverages existing DB connection)
- Token validity: access 15 min, refresh 7 days

## Out of Scope
- OAuth2 social login (separate REQ)
- 2FA (separate REQ)

Generated by Planner Agent → Validator verifies → Architect Agent consumes

---
id: TASK-001
title: "JWT token generation module"
status: draft
parent: REQ-023
created: 2026-04-14
updated: 2026-04-14
dependencies: []           # Tier 0: no dependencies → can run in parallel
---

## Files to Create/Modify
- `auth/jwt.py` — token creation/verification utility
- `tests/test_jwt.py` — unit tests

## Acceptance Criteria
- [ ] create_token(payload, secret) → JWT string
- [ ] verify_token(token, secret) → payload or exception
- [ ] TokenExpiredError raised when verifying an expired token

---
id: TASK-003
title: "Login API endpoint"
status: draft
parent: REQ-023
dependencies: ["TASK-001", "TASK-002"]  # Tier 1: waits for TASK-001, 002
---

Generated by Architect Agent → the dependencies array determines execution order

---
id: LESSON-042
title: "Rotating JWT secret invalidates existing tokens"
domain: "API"              # broad area
component: "API/auth"      # narrow area
tags: [security, jwt]
req: REQ-023
created: 2026-04-14
---

## What Happened
Rotating the secret key invalidated all previously issued tokens.

## Lesson
When rotating secrets, implement multi-key verification logic so tokens
issued with the previous key remain valid during a grace period.

## Applies When
JWT secret management, key rotation, authentication system changes

Generated by Wrapup Agent → future /spec and /architect grep by domain:API + component:API/auth to automatically reference this lesson

Dependency DAG and Parallelization Strategy

The dependencies array in TASK files determines execution order:

Tier 0 (no dependencies):  TASK-001, TASK-002  → run concurrently
         ↓ wait for completion
Tier 1 (depends on Tier 0): TASK-003, TASK-004  → run concurrently
         ↓ wait for completion
Tier 2 (depends on Tier 1): TASK-005            → run alone

This tier-based parallelization operates in Phase 4 (Implementation) of the pipeline. Independent tasks run in parallel in separate worktrees; tasks with dependencies wait for their predecessors to complete.

Multi-Agent Code Review Design

A single reviewer has blind spots — a reviewer strong in security misses performance issues; focusing on architecture means overlooking edge cases. Production systems run 3 specialist reviewers in parallel:

Reviewer	Review Area	Severity Criteria
Correctness Reviewer	Logic errors, race conditions, security vulnerabilities, edge cases	Critical: data loss / security violation
Quality Reviewer	Naming, pattern consistency, duplicate code, hardcoded config	Major: maintainability degradation
Architecture Reviewer	Layer separation, separation of concerns, test coverage, API compliance	Major: structural debt

Severity scale: Critical > Major > Minor > Nit. Any Critical finding means FAIL — feedback is automatically sent back to the coder.

On top of this 3-parallel review pattern sits a 2-stage structure:

/reflect (self-review): The coder agent reviews its own code first, catching obvious mistakes to reduce the burden on independent review.
/review (independent review): Three reviewers in parallel, with no knowledge of the coder’s reasoning.

Boris’s /simplify is a lightweight version of this pattern — same parallel review principle, but catching only structural issues without domain specialization. This design is implemented in Python in Week 9.

Failure Modes and Risk Management

Multi-agent systems have unique failure modes absent in single-agent setups:

Failure Mode	Description	Mitigation Strategy
Context Rot propagation	Context lost at each handoff (Week 5 reference)	Artifact-based handoffs — structured files preserve context
17× error trap	Silent error compounding in unstructured agent networks	Centralized coordination + gated validation
Hallucination propagation	One agent’s hallucination becomes the next agent’s ground truth	Independent validation gate at each phase
Infinite refinement loop	QA→Coder→QA cycles without convergence	Retry cap (3×) + human escalation
State desynchronization	File conflicts between parallel agents	Git worktree isolation — each agent has an independent workspace
Cost escalation	Uncontrolled agent spawning	Concurrency cap (5 agents) + model tier routing (exploration: haiku, implementation: sonnet, review: opus)

In-Class Discussion Questions

Explain the mechanism by which “bag of agents” amplifies errors 17.2× in the DeepMind study. Why does structured coordination reduce this to 4.4×?
On SWE-Bench Pro, the same model shows a score difference of 45.9%–55.4% depending on the scaffolding. Use this data to argue the claim that “the harness matters as much as the model.”
What are the trade-offs between passing natural-language messages between agents versus structured artifact (JSON/Markdown) handoffs? In which situations does each approach excel?
In the /proceed pipeline, what is the rationale for escalating to humans after a maximum of 3 retries at each gate? What problems arise if the retry count is raised to 10?
When applying the single-agent instruction tuning from Week 6 (CLAUDE.md) to a multi-agent system, how would you separate common rules from role-specific rules? Reference sdlc-toolkit’s conventions.md (common) and individual SKILL.md (role-specific) structure.

Practicum

Role Assignment Design

Given a project specification, design the roles, responsibilities, and MCP tool access permissions for 5 agents (Planner, Architect, Coder, QA, Wrapup).
Define Artifact Schemas

Define the schema for every artifact passed between agents. Minimum 3 types: requirement spec, task file (with dependency array), pipeline state.
Dependency DAG Design

Decompose a given requirement into TASK files, draw the dependency graph, and identify tiers that can run in parallel.
Validation Gate Design

Define the verification checklist for each phase transition. Customize the /validate checklist above to fit your project.
Error Recovery Scenarios

Document recovery strategies for 3 failure scenarios (test failure, gate exceeding 3 retries, merge conflict).

Assignment

Lab 07: Multi-Agent Pipeline Design

Submission deadline: 2026-04-21 23:59

Requirements:

5-stage multi-agent architecture diagram (roles, artifacts, gates included)
JSON schema definitions for inter-agent artifacts (minimum 3 types)
Dependency DAG design and parallelization tier analysis
Validation gate checklist (per phase)
Error recovery strategy document (3 scenarios)

Key Takeaways

Multi-Agent SDLC = role separation + structured handoffs + gated validation: The core is not simply running multiple agents, but assigning each agent a clear role and artifact contract.
Bag of agents is harmful: DeepMind research — an unstructured agent collection amplifies errors 17.2×. Central coordination reduces this to 4.4×.
The harness matters as much as the model: On SWE-Bench Pro, the same model shows a 10-percentage-point performance difference depending on scaffolding.
Artifacts replace messages: Instead of direct messages between agents, structured files (requirement.md, TASK-xxx.md, pipeline-state.json) carry the handoffs.
Gated pipeline: A validation gate at each phase transition. Maximum 3 retries before human escalation.
Parallelization is controlled by the dependency DAG: Tier 0 (no dependencies) runs in parallel; Tier N (waits for Tier N-1) runs sequentially.
Knowledge management completes the feedback loop: The domain/component tags in LESSON files automatically inject past lessons into future specs and architectures.

Week 7: Multi-Agent SDLC Design

Theory

Learning Objectives

Why Multi-Agent SDLC in Week 7

Traditional SDLC → Agentic SDLC

Multi-Agent Pipeline Architecture

Gated Pipeline — A Production Example

SDLC Pipeline

Lessons Learned

Feedback Loop

Validation Gates

Assumptions

/proceed REQ-xxx

/bugfix

Agent Coordination Topologies

Inter-Agent Communication Protocols

Claude Code Native Multi-Agent — A Lightweight Alternative

Plan Mode — Built-in Planner Agent

Custom Agents — Declarative Role Specialization

`/simplify` — Built-in QA Agent

`/batch` — Large-Scale Parallel Execution Engine

Skills System — Packaged Instruction Tuning

Full Pipeline vs Native Tools — When to Use Which

Artifact-Based Handoff Design

Dependency DAG and Parallelization Strategy

Multi-Agent Code Review Design

Failure Modes and Risk Management

In-Class Discussion Questions

Practicum

Assignment

Lab 07: Multi-Agent Pipeline Design

Key Takeaways

Further Reading

Week 7: Multi-Agent SDLC Design

Theory

Learning Objectives

Why Multi-Agent SDLC in Week 7

Traditional SDLC → Agentic SDLC

Multi-Agent Pipeline Architecture

Gated Pipeline — A Production Example

SDLC Pipeline

Lessons Learned

Feedback Loop

Validation Gates

Assumptions

/proceed REQ-xxx

/bugfix

Agent Coordination Topologies

Inter-Agent Communication Protocols

Claude Code Native Multi-Agent — A Lightweight Alternative

Plan Mode — Built-in Planner Agent

Custom Agents — Declarative Role Specialization

/simplify — Built-in QA Agent

/batch — Large-Scale Parallel Execution Engine

Skills System — Packaged Instruction Tuning

Full Pipeline vs Native Tools — When to Use Which

Artifact-Based Handoff Design

Dependency DAG and Parallelization Strategy

Multi-Agent Code Review Design

Failure Modes and Risk Management

In-Class Discussion Questions

Practicum

Assignment

Lab 07: Multi-Agent Pipeline Design

Key Takeaways

Further Reading

`/simplify` — Built-in QA Agent

`/batch` — Large-Scale Parallel Execution Engine