Skip to content

Agent OS Runtime

Agent OS Runtime treats agents as verifiable runtime systems, not just prompt-driven programs. This is a self-contained course unit. Students should be able to understand the architecture, contracts, implementation shape, and checklist without reading an external repository or separate subpages.

Core sentence:

MCP-first, provider-agnostic, Plan-Work-Review, event-sourced, Markdown-SSOT, hook-gated, schema-versioned — and a workflow plane on top.

ProblemRuntime answer
Tool calls are hard to traceRoute all external capabilities through MCP-style dispatch
Provider changes break codeKeep thin adapters and common request/response contracts
Agent handoffs are fragile proseValidate IPC with versioned JSON Schema
Long runs are hard to reconstructUse append-only event logs and replay
Operational policy is mixed into codeExternalize allow, deny, transform decisions through hooks
Prompts, skills, and role descriptions drift from codeKeep Markdown as source of truth
Multi-phase workflow policy is scattered across codeSeparate cycle / phase / policy / persona / artifact as a workflow-plane SSOT (L8)
IDInvariantMeaning
I-1ReplayabilityEvery LLM call and state transition is appended to an event log; snapshots are recalculated from the log
I-2Provider SovereigntyProvider changes should be profile/config changes, not business logic rewrites
I-3Fail-closed GatesHooks, schema validation, and tool registration deny by default on failure
I-4Markdown is SSOTSkills, roles, and project context are canonical Markdown artifacts
I-5Schema-versioned IPCMessages between agents, runtime, hooks, and tools are schema-validated

For a user request like "fix the bug in auth.ts", the runtime:

  1. appends session.start;
  2. runs the UserPromptSubmit hook;
  3. discovers matching Markdown skills;
  4. scopes tools to the selected skill’s allowed-tools;
  5. asks Lead for lead_directive_v1;
  6. asks Planner for plan_v1;
  7. lets Worker act through tool dispatch and return worker_report_v1;
  8. asks Reviewer for review_result_v1;
  9. runs the Stop hook;
  10. appends session.close and writes a replayable snapshot.
session.start
hook.fired UserPromptSubmit allow
skill.invoke greet
agent.transition lead
llm.request lead
llm.response lead
agent.transition planner
llm.request planner
llm.response planner
agent.transition worker
tool.invoke echo
tool.result echo
llm.response worker
agent.transition reviewer
llm.response reviewer
hook.fired Stop deny
skill.complete greet ok
session.close ok

Requests that use the L8 workflow plane wrap the single-request flow above in a cycle. In addition to the core events, the runtime records workflow.started, workflow.phase_advanced, workflow.policy_gated, and workflow.completed or workflow.aborted; replay snapshots must restore the current cycle, phase, and verdict. An implementation without L8 is still a complete L1–L7 core runtime for this course.

The goal is not simply to “make the model work.” The goal is to place nondeterministic model calls inside a deterministic, observable, replayable system. L1–L7 is the core runtime that executes a single request safely; the optional L8 Workflow Plane sits on top to sequence multi-phase cycles (a 7+1 structure).

LayerNameResponsibility
L1MCP-first Tool ProtocolRoute external capabilities through a tool boundary with schema, hooks, and events
L2Provider-agnostic CompletionNormalize Anthropic, OpenAI, Gemini, and local models into one call shape
L3Plan-Work-Review CollaborationControl multi-agent execution through role separation and state transitions
L4Append-only Event StoreServe as audit log, replay source, and cost/progress source
L5Markdown-SSOT Skill RuntimeDiscover skills, match triggers, and scope allowed tools
L6Hook-intercepted LifecycleExternalize security, redaction, rate limiting, and loop stop policy
L7Schema-versioned IPC RegistryValidate every boundary payload with versioned JSON Schema
L8Workflow OrchestrationSequence the cycle / phase / policy / persona / artifact axes as Markdown SSOT
L6 Hook Lifecycle
intercepts everything below
L8 Workflow Orchestration
sequences cycles, phases, personas, policies
L5 Markdown Skill Runtime
selects skill and scopes tools
L3 Plan-Work-Review Agent Loop
orchestrates role state
L2 Provider-agnostic Completion
normalizes LLM calls
L1 MCP Tool Protocol
exposes external capability
L4 Append-only Event Store
records every state transition
L7 Schema-versioned IPC Registry
validates every boundary payload

L4 and L7 cut across every layer. A state change without an event, or a boundary payload without schema validation, is treated as an out-of-runtime side effect. L6 intercepts every layer above it, and L8 is a workflow plane layered on top of L1–L7 — it inherits the same five invariants (I-1 through I-5) and decides, not “what should one prompt do?”, but “which phases run in which order until which verdict?”

All external capability goes through a tool registry.

interface ToolDef {
name: string;
schema_in: string;
schema_out: string;
invoke(args: unknown): Result<ToolResult, ToolError>;
}
interface ToolCall {
id: string;
name: string;
args: unknown;
}

Execution order:

  1. find the tool in the registry;
  2. verify the caller’s allowed tool scope;
  3. validate args with schema_in;
  4. run PreToolUse;
  5. append tool.invoke;
  6. invoke the tool;
  7. append tool.result;
  8. run PostToolUse;
  9. validate the result with schema_out.

Forbidden patterns:

  • direct filesystem, subprocess, or HTTP calls from business code;
  • input schema without output schema;
  • continuing execution after hook denial.

Provider SDK objects stay inside thin adapters.

interface CompletionRequest {
messages: Array<{ role: string; content: string }>;
model: string;
tools?: ToolDef[];
max_tokens?: number;
temperature?: number;
schema_version: "v1";
}
interface CompletionResponse {
text: string;
tool_calls: ToolCall[];
finish_reason: "stop" | "tool_use" | "max_tokens" | "safety";
usage: UsageInfo;
model: string;
schema_version: "v1";
}

Invariants:

  • provider changes end at RuntimeProfile;
  • requests and responses are redacted before event logging;
  • retries stay inside adapter boundaries and domain failures return structured errors.
RoleResponsibility
LeadConvert user request into intent and constraints
PlannerProduce verifiable steps and risks
WorkerExecute steps and report tool usage
ReviewerCompare plan and worker report, then pass/fail
AdvisorGive outside advice for ambiguous/high-stakes requests
ScaffolderCreate stubs, tests, branches, or fixtures
INTAKE -> LEAD -> PLAN -> WORK -> REVIEW -> DONE
^ |
| v
REWORK <- FAIL

Every transition appends agent.transition. Handoffs are schema-bound JSON, not free prose.

interface Event {
id: string;
session_id: string;
ts: string;
type: string;
actor: string;
payload: unknown;
schema_version: "v1";
parent_id: string | null;
}

Rules:

  • each session owns sessions/{session_id}/.events.jsonl;
  • events are append-only;
  • corrections use new event.amended events;
  • snapshots must be recalculable from the event log;
  • replay creates no side effects. Calling LLMs or tools again is a re-run, not replay.
---
name: greet
description: Echoes a greeting using the echo tool.
triggers:
- greet
- hello
allowed-tools:
- echo
version: 1
schema_version: v1
---

Skill discovery:

  1. scan skill roots at startup;
  2. read SKILL.md frontmatter;
  3. validate with skill_frontmatter_v1;
  4. register by skill id;
  5. match user prompt triggers;
  6. pass selected allowed-tools to L1 dispatch.

Markdown is canonical. If code and skill text disagree, adjust the code to match Markdown.

HookPurpose
SessionStartsession initialization policy
UserPromptSubmitprompt redaction, reject, rewrite
PreToolUseallow/deny/transform before tool execution
PostToolUseobserve or redact tool results
SkillStartskill-specific policy
Stoploop termination or continuation
PreCompactprotect information before compaction
type HookDecision =
| { decision: "allow"; reason?: string }
| { decision: "deny"; reason: string }
| { decision: "transform"; output: unknown; reason?: string };

If a hook throws or times out, the default decision is deny. Hooks return decisions; they do not mutate runtime state directly.

completion_request_v1.json
completion_response_v1.json
lead_directive_v1.json
plan_v1.json
worker_report_v1.json
review_result_v1.json
skill_frontmatter_v1.json
write_claim_v1.json

Validate before provider requests, after provider responses, after role output parsing, during skill registration, at tool input/output boundaries, and after hook transforms.

v1 is frozen. Breaking changes require a new schema such as plan_v2.json.

L1–L7 are the core primitives that execute a single agent request deterministically. They do not, however, answer “what cycle do we start with today, and when does it end?” L8 Workflow Orchestration uses a single L3 Plan-Work-Review pass as a building block and defines the following five axes as Markdown + JSON Schema SSOT.

AxisUnit of definitionLocationSchema
CycleA user-intent bundle — phases, entry/exit/abort conditions, loop boundsworkflows/cycles/cycle_v1
PhaseA step inside a cycle — advance_signal, halt_signal, personas invokedworkflows/phases/phase_v1
PolicyA gating rule (allow / deny / advisory, default deny)workflows/policies/policy_v1
PersonaA domain specialist layered on the L3 six roles (reviewer / researcher / document-reviewer)workflows/personas/skill_frontmatter_v1-compatible
ArtifactA deliverable template — naming rule, frontmatterworkflows/artifacts/per-template schema

L8 calls L3; it never references L1 or L2 directly (L3 does). L8 inherits the same five invariants (I-1 through I-5), so workflow definitions must also be deterministic, auditable, and provider-agnostic.

USER REQUEST
-> resolve cycle_id (from a skill or a lead_directive)
-> WorkflowStart hook (allow / deny / transform)
-> phase[0] = entry_phase
-> loop:
evaluate advance_signal / halt_signal
if halt: WorkflowComplete (verdict=abort)
fan-out personas in phase.agents_invoked
fan-in findings -> aggregate
evaluate policies in order
PhaseAdvance hook (allow / deny / transform)
phase = next phase
if cycle.exit_conditions.done: WorkflowComplete (verdict=done)
if cycle.loop_bounds breached: WorkflowComplete (verdict=halted)

Each phase may internally invoke a single L3 cycle. With one persona that becomes plain L3; with several personas it follows the fan-out → fan-in pattern.

KindDecision unit
confidence-gatingSplit actionable vs suppressed findings by confidence × severity
severity-routingRoute severity × autofix_class into apply / gated / manual / advisory
role-permissionsRestrict sub-persona dispatch and the paths a persona may write
mode-dispatchVary UX and deliverables by interactive / autofix / report-only / headless mode
loop-haltStop a bounded loop on max_generations, oscillation, or grade regression

Each rule is a whenthenreason triple. If nothing matches, the default (allow / deny / advisory) applies; if default is omitted, deny is the default-of-default (fail-closed). When reviewers disagree on the same fingerprint, the conservative choice wins (safe_auto < gated_auto < manual < advisory, allow < deny).

HookWhen
WorkflowStartJust before a cycle begins — allow / deny / transform
PhaseAdvanceJust before a phase transition — allow / deny / transform
WorkflowCompleteJust after a verdict is decided — observe / record

The audit events L8 emits:

workflow.started cycle_id, session_id
workflow.phase_advanced cycle_id, from_phase, to_phase
workflow.policy_gated policy_id, decision, reason
workflow.completed cycle_id, verdict
workflow.aborted cycle_id, reason

LLM calls produced by persona fan-out are still recorded by L2 as llm.request / llm.response; L8 only adds workflow context on top.

  • Zero code lines to add a cycle: adding workflows/cycles/<id>.md alone must be enough. If Python/Go/TS code must change, the design is broken.
  • Unknown IDs fail closed: unknown cycle/phase/policy/persona IDs are denied by the WorkflowRegistry. Allow-list registration only.
  • Personas live in their own Markdown: never inline a persona inside a cycle. Split into personas/<role>/<id>.md and reference by ID.
  • Signals are deterministic expressions: never delegate advance_signal to free-form natural-language evaluation. Use evaluable expressions like review_aggregate.p0_unresolved == 0.
  • Artifact naming lives in the template: do not describe naming as informal prose. Encode the pattern in the frontmatter of artifacts/<id>-template.md.

Contracts are execution requirements, not documentation decoration.

{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "completion_request_v1",
"type": "object",
"required": ["messages", "model", "schema_version"],
"properties": {
"messages": {
"type": "array",
"items": {
"type": "object",
"required": ["role", "content"],
"properties": {
"role": {"type": "string"},
"content": {"type": "string"}
}
}
},
"model": {"type": "string"},
"tools": {"type": "array"},
"max_tokens": {"type": "integer"},
"temperature": {"type": "number"},
"schema_version": {"const": "v1"}
}
}
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "completion_response_v1",
"type": "object",
"required": ["text", "tool_calls", "finish_reason", "usage", "model", "schema_version"],
"properties": {
"text": {"type": "string"},
"tool_calls": {
"type": "array",
"items": {
"type": "object",
"required": ["id", "name", "args"],
"properties": {
"id": {"type": "string"},
"name": {"type": "string"},
"args": {"type": "object"}
}
}
},
"finish_reason": {"enum": ["stop", "tool_use", "max_tokens", "safety"]},
"usage": {
"type": "object",
"required": ["input_tokens", "output_tokens"],
"properties": {
"input_tokens": {"type": "integer"},
"output_tokens": {"type": "integer"},
"cost_usd": {"type": "number"}
}
},
"model": {"type": "string"},
"schema_version": {"const": "v1"}
}
}
{
"$id": "lead_directive_v1",
"type": "object",
"required": ["intent", "constraints", "schema_version"],
"properties": {
"intent": {"type": "string", "minLength": 1},
"constraints": {"type": "array", "items": {"type": "string"}},
"needs_advisor": {"type": "boolean"},
"schema_version": {"const": "v1"}
}
}
{
"$id": "plan_v1",
"type": "object",
"required": ["steps", "risks", "needs_advisor", "schema_version"],
"properties": {
"steps": {
"type": "array",
"minItems": 1,
"items": {
"type": "object",
"required": ["id", "description"],
"properties": {
"id": {"type": "string"},
"description": {"type": "string"},
"depends_on": {"type": "array", "items": {"type": "string"}}
}
}
},
"risks": {"type": "array", "items": {"type": "string"}},
"needs_advisor": {"type": "boolean"},
"schema_version": {"const": "v1"}
}
}
{
"$id": "worker_report_v1",
"type": "object",
"required": ["output", "steps_completed", "schema_version"],
"properties": {
"output": {"type": "string"},
"steps_completed": {"type": "array", "items": {"type": "string"}},
"tool_calls_made": {"type": "array", "items": {"type": "string"}},
"schema_version": {"const": "v1"}
}
}
{
"$id": "review_result_v1",
"type": "object",
"required": ["verdict", "feedback", "schema_version"],
"properties": {
"verdict": {"enum": ["pass", "fail"]},
"feedback": {"type": "string"},
"schema_version": {"const": "v1"}
}
}

L8-aware implementations register the following additional schemas. The table below is the minimum contract students need: workflows/ Markdown frontmatter and emitted artifact JSON must satisfy these shapes.

Schema idRequired / core fieldsRole
cycle_v1id, phases, entry_phase, schema_version; optional exit_conditions, loop_bounds, policiesBundles user intent into a phase sequence and exit conditions
phase_v1id, advance_signal, schema_version; optional halt_signal, agents_invoked, input_schema, output_schema, policiesA deterministic step inside a cycle
policy_v1id, kind, rules, schema_version; optional defaultDecides allow, deny, or advisory through when / then / reason rules
finding_v1id, reviewer, severity, autofix_class, confidence, title, schema_versionOne persona review finding
review_aggregate_v1findings, suppressed, pre_existing, verdict, schema_version; optional coverage, p0_unresolvedFan-in result and gate input
brainstorm_v1 / plan_v1 (reused) / solution_v1 / learning_v1 / pulse_report_v1per-schema required fieldsper-cycle artifact deliverables

finding_v1.severity is P0-P3, confidence is 0/25/50/75/100, and autofix_class is one of safe_auto, gated_auto, manual, advisory. The critical/major/minor/info severities used in the course labs can map to P0/P1/P2/P3.

{
"$id": "skill_frontmatter_v1",
"type": "object",
"required": ["name", "description"],
"properties": {
"name": {"type": "string"},
"description": {"type": "string"},
"triggers": {"type": "array", "items": {"type": "string"}},
"allowed-tools": {"type": "array", "items": {"type": "string"}},
"plugin": {"type": "string"},
"version": {"type": "integer"},
"schema_version": {"const": "v1"}
}
}
{
"$id": "write_claim_v1",
"type": "object",
"required": ["who", "why", "expected_hash", "schema_version"],
"properties": {
"who": {"type": "string"},
"why": {"type": "string"},
"expected_hash": {"type": "string"},
"schema_version": {"const": "v1"}
}
}
# Lead role
You are the lead agent. Your job is to receive a user request and turn it into a lead_directive_v1 JSON payload.
Return JSON:
{
"intent": "<one-sentence statement of what the user wants>",
"constraints": ["<constraint 1>", "<constraint 2>"],
"needs_advisor": false,
"schema_version": "v1"
}
Rules:
- Set needs_advisor true only when the request is ambiguous, high stakes, or scope-defining.
- Constraints capture explicit must/must-not statements from the user prompt.
- Output JSON only.
# Planner role
Decompose a lead_directive_v1 into a step-by-step plan.
Return JSON matching plan_v1:
{
"steps": [
{"id": "s1", "description": "<what to do>", "depends_on": []}
],
"risks": ["<risk 1>"],
"needs_advisor": false,
"schema_version": "v1"
}
Rules:
- Steps must be independently verifiable.
- depends_on lists step ids that must complete first.
- risks captures ambiguity, missing context, unavailable dependencies, or likely failure.
- Output JSON only.
# Worker role
Execute the plan_v1 steps. Use available tools declared by the selected skill.
Return JSON matching worker_report_v1:
{
"output": "<final result>",
"steps_completed": ["s1", "s2"],
"tool_calls_made": ["echo:t1"],
"schema_version": "v1"
}
Rules:
- Use tools through dispatch_tool only.
- If a step fails, stop and report partial progress.
- Do not fabricate output or tool calls.
- Output JSON only.
# Reviewer role
Verify a worker_report_v1 against the plan_v1 it was supposed to fulfill.
Return JSON matching review_result_v1:
{
"verdict": "pass",
"feedback": "<reason>",
"schema_version": "v1"
}
Rules:
- verdict pass only if every plan step has a corresponding completed step and output addresses the intent.
- verdict fail requires concrete feedback.
- The runtime should prevent a role from reviewing its own output.
- Output JSON only.
---
name: greet
description: Echoes a greeting back to the user using the echo tool.
triggers:
- greet
- hello
- hi
allowed-tools:
- echo
plugin: skeleton
version: 1
schema_version: v1
---
# Greet skill
When a user prompt contains "greet", "hello", or "hi", this skill takes over.
1. Worker constructs a greeting string from the user's name, defaulting to "world".
2. Worker invokes the echo MCP tool with the greeting.
3. Worker reports the echoed value as final output.
4. Reviewer verifies output is non-empty.
from unified.hooks import HookDecision, HookHandler
def make_pre_tool_use_handler() -> HookHandler:
def fn(input: dict) -> HookDecision:
return HookDecision(decision="allow", reason="default-allow")
return HookHandler(id="pre_tool_use_default", fn=fn, priority=10)
def make_deny_dangerous_tools_handler() -> HookHandler:
deny = {"rm", "delete", "drop_table"}
def fn(input: dict) -> HookDecision:
if input.get("tool") in deny:
return HookDecision(decision="deny", reason=f"tool {input['tool']} on denylist")
return HookDecision(decision="allow")
return HookHandler(id="deny_dangerous_tools", fn=fn, priority=100)
def make_stop_handler() -> HookHandler:
def fn(input: dict) -> HookDecision:
return HookDecision(decision="deny", reason="loop reached terminal state")
return HookHandler(id="stop_after_one", fn=fn, priority=10)
def make_user_prompt_submit_handler() -> HookHandler:
def fn(input: dict) -> HookDecision:
prompt = input["prompt"]
import re
redacted = re.sub(r"\b\d{16}\b", "[REDACTED-CC]", prompt)
if redacted != prompt:
return HookDecision(
decision="transform",
output={"prompt": redacted},
reason="redacted suspected card number",
)
return HookDecision(decision="allow")
return HookHandler(id="redact_pii", fn=fn, priority=80)
  • Role outputs are JSON only.
  • Schema validation failure fails closed.
  • Invalid skill frontmatter prevents skill registration.
  • Tool calls must be inside the selected skill’s allowed-tools.
  • Hook transform outputs must pass the next boundary schema.
  • Event payloads are routed by event type schema.

This guide defines a course reference runtime, not a production framework. The goal is semantic parity across implementations, not feature count.

agent-runtime/
├── schemas/
├── agents/
├── skills/greet/SKILL.md
├── hooks/
├── runtime/
│ ├── event_store.py
│ ├── schema.py
│ ├── mcp.py
│ ├── provider.py
│ ├── skills.py
│ ├── hooks.py
│ └── agents.py
└── sessions/
└── sess_01/.events.jsonl

The shared contract surface is schemas/, agents/, and skills/. If Python, Go, and TypeScript fork these files, parity breaks.

from dataclasses import dataclass
from typing import Generic, TypeVar
T = TypeVar("T")
E = TypeVar("E")
@dataclass
class Ok(Generic[T]):
value: T
@dataclass
class Err(Generic[E]):
error: E
@dataclass
class DomainError:
kind: str
code: str
message: str
retryable: bool = False

Return domain failures as Err(DomainError). Reserve exceptions for runtime invariant violations.

import json
import time
from dataclasses import asdict, dataclass
from pathlib import Path
from uuid import uuid4
@dataclass
class Event:
id: str
session_id: str
ts: float
type: str
actor: str
payload: dict
schema_version: str = "v1"
parent_id: str | None = None
class EventStore:
def __init__(self, root: Path):
self.root = root
def append(self, session_id: str, type: str, actor: str, payload: dict, parent_id: str | None = None) -> Event:
event = Event(
id=str(uuid4()),
session_id=session_id,
ts=time.time(),
type=type,
actor=actor,
payload=payload,
parent_id=parent_id,
)
session_dir = self.root / session_id
session_dir.mkdir(parents=True, exist_ok=True)
with (session_dir / ".events.jsonl").open("a", encoding="utf-8") as f:
f.write(json.dumps(asdict(event), ensure_ascii=False) + "\n")
return event
def replay(events: EventStore, session_id: str) -> dict:
snap = {
"closed": False,
"output": None,
"cost": {"input": 0, "output": 0},
"agents": {},
"tools_invoked": [],
}
for event in events.read_all(session_id):
if event.type == "llm.response":
usage = event.payload.get("usage", {})
snap["cost"]["input"] += usage.get("input_tokens", 0)
snap["cost"]["output"] += usage.get("output_tokens", 0)
elif event.type == "tool.invoke":
snap["tools_invoked"].append(event.payload)
elif event.type == "agent.transition":
snap["agents"][event.actor] = event.payload.get("to")
elif event.type == "worker.report":
snap["output"] = event.payload.get("output")
elif event.type == "session.close":
snap["closed"] = True
return snap
def dispatch_tool(call, mcp, schemas, hooks, events, session_id, allowed_tools):
if call.name not in allowed_tools:
return Err(DomainError("gate", "TOOL_NOT_ALLOWED", call.name))
tool = mcp.find_tool(call.name)
if tool is None:
return Err(DomainError("gate", "TOOL_NOT_FOUND", call.name))
checked = schemas.validate(call.args, tool.schema_in)
if isinstance(checked, Err):
return checked
gate = hooks.fire("PreToolUse", {"tool": call.name, "args": call.args})
if gate.decision == "deny":
return Err(DomainError("gate", "GATE_DENIED", gate.reason))
invoke_event = events.append(session_id, "tool.invoke", "worker", {"name": call.name, "args": call.args})
result = tool.invoke(gate.output or call.args)
events.append(session_id, "tool.result", "tool", {"name": call.name, "result": result}, parent_id=invoke_event.id)
hooks.fire("PostToolUse", {"tool": call.name, "result": result})
return Ok(result)
def handle_user_request(runtime, prompt: str):
session_id = runtime.new_session_id()
runtime.events.append(session_id, "session.start", "user", {"prompt": prompt})
intake = runtime.hooks.fire("UserPromptSubmit", {"prompt": prompt})
if intake.decision == "deny":
runtime.events.append(session_id, "session.close", "system", {"ok": False, "reason": intake.reason})
return Err(DomainError("gate", "PROMPT_DENIED", intake.reason))
prompt = intake.output.get("prompt", prompt) if intake.decision == "transform" else prompt
skill = runtime.skills.select(prompt)
allowed_tools = set(skill.allowed_tools if skill else [])
lead = runtime.call_role("lead", prompt, schema="lead_directive_v1", session_id=session_id)
plan = runtime.call_role("planner", lead.value, schema="plan_v1", session_id=session_id)
report = runtime.call_role("worker", plan.value, schema="worker_report_v1", session_id=session_id, allowed_tools=allowed_tools)
review = runtime.call_role("reviewer", {"plan": plan.value, "report": report.value}, schema="review_result_v1", session_id=session_id)
ok = review.value["verdict"] == "pass"
runtime.hooks.fire("Stop", {"session_id": session_id, "ok": ok})
runtime.events.append(session_id, "session.close", "system", {"ok": ok})
return Ok({"session_id": session_id, "report": report.value, "review": review.value})
ItemPythonGoTypeScript
schema validationreal JSON Schema validationminimal validation or deterministic checksminimal validation or deterministic checks
event log.events.jsonl append/read/replaytemp session logtest session log
providermock + real adapter boundarydeterministic mockdeterministic mock
skill discoveryYAML frontmatter validationshared fixture readshared fixture read
protected writehash check + conflicthash check + conflicthash check + conflict
testsend-to-end checklistgo test ./...Node test runner
L8 workflow planeworkflows/ SSOT loadercore-only (optional)core-only (optional)

L1–L7 is the core compliance surface; L8 is an optional plane. L8-capable implementations must read the workflows/ directory (cycles/, phases/, policies/, personas/, artifacts/) as SSOT and add new cycles with zero code changes — that is the design invariant.

import hashlib
from pathlib import Path
def sha256_of(text: str) -> str:
return hashlib.sha256(text.encode("utf-8")).hexdigest()
def protected_write(path: Path, new_text: str, expected_hash: str, who: str, why: str):
current = path.read_text(encoding="utf-8") if path.exists() else ""
if expected_hash and sha256_of(current) != expected_hash:
return Err(DomainError("lock", "WRITE_CONFLICT", f"{who}: {why}"))
path.write_text(new_text, encoding="utf-8")
return Ok({"path": str(path), "hash": sha256_of(new_text)})
1. required schemas are loaded
2. greet skill is discovered
3. echo tool is registered
4. PreToolUse, Stop, UserPromptSubmit hooks are registered
5. provider can be swapped through RuntimeProfile
6. handle_user_request completes lead -> planner -> worker -> reviewer
7. .events.jsonl contains session, llm, agent, hook, skill, tool events
8. replay() reproduces closed snapshot and worker output
9. protected_write rejects stale expected_hash
10. invalid schema payload fails closed
11. UserPromptSubmit can redact a 16-digit sequence

Use this checklist for Lab 07 multi-agent pipelines, Lab 11 telemetry, and the Ralphthon capstone.

  • Event type and append-only .events.jsonl event store
  • replay() recalculates snapshots from the event log
  • SchemaRegistry loads and validates at least five schemas
  • RuntimeProfile selects provider backend and model through config
  • mock provider path enables deterministic end-to-end tests without API keys
  • MCP-style echo tool is registered
  • dispatch_tool() enforces allowed tools, schema, hooks, and events
  • at least three hooks: PreToolUse, Stop, UserPromptSubmit
  • Markdown skill discovery through skills/greet/SKILL.md
  • skill frontmatter validation and invalid skill skip
  • Plan-Work-Review loop: Lead -> Planner -> Worker -> Reviewer
  • role output schema validation
  • protected write conflict detection
  • schema violations fail closed rather than silently passing

A runtime that also claims L8 support must, on top of the core checklist:

  • a WorkflowRegistry that loads cycle / phase / policy / persona / artifact Markdown SSOT fail-closed
  • cycle_v1, phase_v1, policy_v1 schemas plus per-cycle artifact schemas
  • workflow.started, workflow.phase_advanced, workflow.policy_gated, workflow.completed, workflow.aborted events
  • WorkflowStart, PhaseAdvance, WorkflowComplete hook integration
  • replay snapshot reconstruction for current cycle, current phase, verdict, and visited phases
  • new cycles/phases/policies/personas can be added by writing Markdown alone (zero code lines)
  • unknown cycle/phase/policy/persona IDs deny fail-closed

A successful session should include at least:

session.start
hook.fired
skill.invoke
agent.transition
llm.request
llm.response
tool.invoke
tool.result
worker.report
skill.complete
session.close

Sessions that run through the L8 plane add the following on top:

workflow.started
workflow.phase_advanced (one per phase transition)
workflow.policy_gated (one per policy decision)
workflow.completed (verdict=done|halted)
workflow.aborted (verdict=abort)

Audit failure if any of these are missing:

  • all LLM request/response pairs;
  • all tool invoke/result pairs;
  • all agent state transitions;
  • all hook decisions;
  • all schema violations;
  • session close.
CaseExpected behavior
unknown toolTOOL_NOT_FOUND
tool outside selected skillTOOL_NOT_ALLOWED
invalid role JSONMALFORMED_AGENT_MESSAGE
missing schema idinvariant violation or schema error
hook timeout/exceptiondeny
stale file hashWRITE_CONFLICT
provider retry exhaustedPROVIDER_ERROR
reviewer verdict failrework or human escalation
Anti-patternWhy it is badAlternative
direct filesystem/network calls inside runtimebreaks audit and sandbox boundaryMCP tool dispatch
provider SDK object leaks into business logicprovider swap requires code editsthin adapter
prose role handoffdownstream parsing is nondeterministicversioned JSON Schema
in-place event editsbreaks replay and causalityappend event.amended
skill registered in codeoperational knowledge is locked in deploy artifactMarkdown discovery
hook directly mutates runtime statemixes policy and state transitionreturn decision + append event
breaking schema v1old session replay breaksadd v2 schema
inlining a persona inside a cyclebreaks audit, coverage, and role-permissionssplit into personas/<role>/<id>.md; cycle references the ID
natural-language advance_signalsignal is nondeterministic, replay/regression impossibledeterministic expression like review_aggregate.p0_unresolved == 0
hard-coding policy inside Pythonnew policies are locked to a deploy cycleregister in workflows/policies/ and reference by ID
new cycle requires Python editsthe workflow plane collapses back into the code planeadd Markdown to workflows/cycles/ only
skipping L4 audit for a workflow transitioninvariant violation — replay/audit brokenappend a workflow.* event for every phase/policy transition
CriterionPassing condition
Runtime boundarytool/provider/skill/hook/schema/event responsibilities are separated
Contract disciplinerole outputs and tool boundaries pass schema validation
Observability.events.jsonl and replay snapshot submitted
Safetyat least two fail-closed cases tested
Determinismrepeatable end-to-end test through mock provider
DocumentationREADME includes commands, event examples, and known limitations
assignments/lab-07/20230001/
├── schemas/
├── agents/
├── skills/greet/SKILL.md
├── runtime/
├── tests/
├── sessions/example/.events.jsonl
├── replay_snapshot.json
└── README.md

The README must include execution commands, mock-provider test instructions, at least 10 event trace lines, a way to reproduce schema violation or hook denial, and known limitations.

WeekConnection
Week 03MCP is a capability boundary, not just a convenience API
Week 04Ralph Loop becomes operational when Stop hooks and event logs are added
Week 05Context reset is safe only with Markdown state and event replay
Week 06CLAUDE.md/PROMPT.md generalizes into a Markdown-SSOT skill runtime
Week 07The gated multi-agent SDLC pipeline generalizes as one instance of an L8 cycle/phase definition
Week 09The three-way parallel reviewer plus severity PASS/FAIL is an informal implementation of L8’s persona fan-out + severity-routing policy
Week 12Telemetry includes OpenTelemetry spans and replayable event logs — workflow.* events join the audit surface when L8 is used
Week 13–14Team runtime checklist can become the rubric. Teams that run multi-phase cycles may additionally submit L8 cycle/phase/policy Markdown SSOT