13주차: 캡스톤 프로젝트 설계

Phase 513주차고급강의일: 2026-05-26

이론 (Theory)

오늘의 학습 목표

개념 관점

Ralphthon이 해커톤과 다른 이유를 closed-loop 시스템 관점에서 설명하고, 좋은 캡스톤 주제의 5가지 조건을 자기 팀의 후보에 적용한다.

설계 관점

Agent OS Runtime L1–L7 core를 캡스톤 아키텍처에 매핑하고, 필요하면 L8 workflow plane으로 cycle·phase·policy를 추가 설계한다.

구현 관점

Lead/Planner/Worker/Reviewer/Operator 역할로 코드 책임을 분배하고, ADR 3개와 risk register 5개 항목을 작성한다.

운영 관점

14-16주차 구현·통합·발표 일정을 역산해 demo path와 scope cut(Won’t have)을 명시적으로 결정한다.

Ralphthon이란?

Ralphthon은 Ralph 루프 방법론을 기반으로 실제 소프트웨어 문제를 해결하는 팀 캡스톤이다. 일반 해커톤과 다른 점은 결과물이 “앱 하나”가 아니라 반복 실행 가능한 에이전트 시스템이라는 점이다.

해커톤 vs Ralphthon 구조 비교

축	일반 해커톤	Ralphthon
산출물	데모 앱 1개	반복 실행 가능한 에이전트 시스템
성공 기준	데모가 작동함	같은 task packet을 3회 실행해 일관 결과
AI 사용	자유 (코드 보조)	task packet → 하네스 → gate를 통과
평가 자료	데모 영상	event log, replay snapshot, gate 결과
팀 구성	자유	Lead/Planner/Worker/Reviewer/Operator
실패 처리	즉흥 대응	runbook + retry budget + escalation
사후 운영	보통 종료	replay와 ADR로 재현 가능

좋은 주제의 조건

조건	설명	나쁜 예
반복 가능	같은 유형의 작업이 여러 번 발생한다	한 번만 쓰는 데모
검증 가능	테스트, rubric, judge, human review가 가능하다	”좋은 글을 써줘”
경계 명확	파일/도구/권한 범위를 제한할 수 있다	전체 인터넷을 마음대로 조작
실패 복구 가능	잘못된 출력이 나와도 rollback/retry 가능하다	DB를 바로 수정
팀 역할 분리	planner/worker/reviewer/operator가 나뉜다	한 사람이 모든 것 수행

Agent OS Runtime으로 보는 캡스톤 구조

13주차 설계 문서는 다음 L1–L7 core 중 최소 5개를 명시해야 한다. L8 workflow plane은 다중 phase 사이클을 명시적으로 설계하려는 팀만 추가한다.

계층	캡스톤에서 정의할 것
L1 MCP Tool Protocol	허용/금지 도구, tool input/output schema, tool event
L2 Provider Completion	model profile, cost/latency 예산, fallback 규칙
L3 Plan-Work-Review Collaboration	Lead, Planner, Worker, Reviewer 상태 전이
L4 Event Store	`.events.jsonl`, replay snapshot
L5 Markdown-SSOT Skill Runtime	역할별 instruction, rubric, allowed tool scope
L6 Hook Lifecycle	approval, secret scan, loop stop, escalation hook
L7 Schema IPC Registry	task packet, worker report, review verdict, run report schema
Optional L8 Workflow Plane	cycle / phase / policy / persona / artifact Markdown SSOT

자세한 계층 설명은 Agent OS 7+1-Layer Architecture (L1–L7 core + L8 workflow plane)를 참고한다. 다중 phase 사이클(예: brainstorm → fix → ship)을 설계하려는 팀은 L8 Workflow Plane의 5축 모델(cycle / phase / policy / persona / artifact)을 함께 본다. 다만 L8 사용은 선택이며, L1–L7 코어만으로도 캡스톤 평가 기준은 충족된다 — L8은 사이클 시퀀싱·정책·페르소나를 markdown SSOT로 정리하고 싶은 팀을 위한 옵션이다.

Core + L8 매핑 시각화

Agent OS Runtime — L1-L7 Core + Optional L8

Optional L8 Workflow Planecycle · phase · policy · persona · artifact

L7 Schema IPCtask packet · report · verdict schemas

L6 Hook Lifecycleapproval · secret scan · loop stop

L5 Skill RuntimeMarkdown instruction · allowed tools

L4 Event Store.events.jsonl · replay snapshot

L3 CollaborationLead · Planner · Worker · Reviewer

L2 Provider Completionmodel profile · fallback · budget

L1 MCP Tool Protocolallowed tools · input/output schema

각 core 계층은 산출물을 가진다. 13주차에 L1–L7을 모두 만들 필요는 없지만, 최소 5개 계층은 구체적으로 정의하고 나머지 계층은 14주차 책임자를 명시한다. L8을 선택한 팀은 cycle/phase/policy Markdown과 workflow.* event 증거까지 별도 산출물로 둔다.

권장 팀 역할

Lead / Architect

문제 정의, 범위 관리, 최종 설계 책임. scope creep을 막고 acceptance criteria를 고정한다.

Harness Engineer

task packet, event store, policy gate, retry/rollback 로직을 구현한다.

Agent Engineer

role prompt, tool policy, model routing, CLI 연동을 구현한다.

QA / Operator

테스트, judge rubric, telemetry dashboard, 발표 데모 안정성을 책임진다.

팀 협업 토폴로지

Capstone Team Topology

Lead / Architectdirective 발행 · scope 관리

▼ directive

Planner Agentspec / plan 생성

▼ plan

Worker Agentpatch / artifact 생성

▼ patch

Reviewer Agentreview verdict

▼ verdict

Operator / QAgate 통합 · dashboard · 발표 안정성

▲ gate result → Lead 회귀

Event Storetool events · review events · metrics 모두 수렴 → replay snapshot

이 토폴로지는 Lead가 모든 결정을 하지 않는다는 점이 핵심이다. Operator는 metrics와 dashboard로 실시간 신호를 본다.

MVP 범위 결정

좋은 캡스톤은 처음부터 완성형 제품을 선언하지 않는다. 다음 표처럼 반드시 남길 것과 과감히 버릴 것을 분리한다.

구분	예시	판단 기준
Must have	task packet, single worker loop, deterministic gate, event log	없으면 closed loop가 아님
Should have	reviewer/judge, replay snapshot, simple dashboard	최종 평가 증거가 강해짐
Could have	web UI, multi-model router, agent marketplace	시간이 남을 때만
Won’t have	완전 자율 배포, 외부 계정 자동 조작, 복잡한 권한 위임	위험과 구현 비용이 과함

13주차 설계 문서에는 Won't have가 반드시 있어야 한다. scope cut이 없는 계획은 실행 계획이 아니라 희망 목록이다.

Task Packet 설계

자연어 한 문장으로 에이전트를 호출하면 결과가 흔들린다. 캡스톤에서는 모든 작업을 task packet으로 전달한다.

task_id: capstone-017
objective: "Add retry handling to the GitHub issue importer"
scope:
  files:
    - src/importer/github.py
    - tests/test_github_importer.py
allowed_tools:
  - read_file
  - edit_file
  - run_tests
acceptance:
  - "pytest tests/test_github_importer.py passes"
  - "No network call in unit tests"
  - "Retry count is configurable"
budget:
  max_turns: 6
  max_tokens: 120000
escalation:
  ask_human_if:
    - "API contract must change"
    - "Secret or credential is required"

Task packet 평가표 — good / borderline / anti-pattern

항목	Good	Borderline	Anti-pattern
Objective	동사+측정 가능한 결과	모호한 동사	”더 좋게 만들어 줘”
Scope	파일/디렉토리 명시	”관련 코드”	전체 repo
Allowed tools	read/edit/run_tests 한정	광범위	”any”
Acceptance	pass/fail 가능 3-5개	1개만	없음
Budget	turns·tokens 명시	한쪽만	무제한
Escalation	조건 + 담당자	조건만	없음

Task packet JSON Schema (요약)

{
  "type": "object",
  "required": ["task_id", "objective", "scope", "acceptance", "budget"],
  "properties": {
    "task_id": {"type": "string", "pattern": "^[a-z0-9-]{4,}$"},
    "objective": {"type": "string", "minLength": 10},
    "scope": {
      "type": "object",
      "properties": {
        "files": {"type": "array", "items": {"type": "string"}}
      }
    },
    "allowed_tools": {"type": "array", "items": {"type": "string"}},
    "acceptance": {
      "type": "array", "minItems": 1, "items": {"type": "string"}
    },
    "budget": {
      "type": "object",
      "required": ["max_turns", "max_tokens"],
      "properties": {
        "max_turns": {"type": "integer", "maximum": 15},
        "max_tokens": {"type": "integer"}
      }
    },
    "escalation": {"type": "object"}
  }
}

캡스톤 시작 시점에 schema validator를 한 줄 코드로 통과시키지 못하는 task는 받지 않는다는 약속이 큰 효과를 낸다.

리스크 평가 프레임워크

리스크는 추상적으로 적지 말고 STRIDE 응용 표로 정리한다.

위협 카테고리	캡스톤 적용	예시
Spoofing	모델 식별	다른 모델이 같은 alias로 응답
Tampering	task/artifact 변조	event log 후처리 누락으로 결과 위조 가능
Repudiation	책임 회피	override가 익명으로 일어남
Information disclosure	민감 데이터 유출	코드/시크릿이 외부 API로 전송
Denial of service	자원 고갈	무한 retry, queue overflow
Elevation of privilege	권한 초과	허용 범위 밖 파일 수정

각 항목은 “현재 계획에서 어떻게 차단하는가”를 한 줄씩 적는다. 차단이 없다면 14주차에 만들어야 할 항목이다.

Architecture Decision Record

각 팀은 중요한 결정을 ADR로 남긴다. 최소 3개가 필요하다.

# ADR-001: Use vLLM OpenAI-compatible API

## Context
We need to compare local and commercial models through the same harness.

## Decision
Expose local models through vLLM's OpenAI-compatible API and route calls through one client wrapper.

## Consequences
- Good: model provider can be swapped without changing agent code.
- Bad: model-specific tool-call parsers still require configuration.

좋은 ADR vs 나쁜 ADR

측면	좋은 ADR	나쁜 ADR
Context	”왜 지금” 1-2문장	전사적 비전
Decision	한 문장 + 대안 거절 사유	”Use X”
Consequences	Good/Bad 양면	Good만
Owner	책임자 1명	익명
Date	결정일	비어 있음

# Team Name Capstone Design

## 1. Problem
## 2. Users and Risk Boundary
## 3. Agent Architecture
## 4. Runtime Layers
## 5. Task Packet Schema
## 6. Evaluation Gates
## 7. Telemetry and Replay
## 8. Implementation Plan
## 9. Demo Scenario

Risk Register

설계 문서에는 리스크를 추상적으로 쓰지 않는다. 각 리스크는 trigger, owner, response를 가진다.

Risk	Trigger	Owner	Response
모델 출력이 JSON schema를 반복 위반	invalid JSON rate > 20%	Agent Engineer	structured output 또는 repair step 추가
테스트가 없어서 품질 평가 불가	deterministic gate가 비어 있음	QA / Operator	최소 smoke test와 fixture 작성
scope creep	14주차까지 happy path 미통과	Lead	Could have 항목 전부 제거
비용 초과	run당 token budget 초과	Harness Engineer	max_turns, context trim, cache prefix 적용
데모 불안정	같은 task 3회 중 1회 이상 실패	QA / Operator	live demo 축소, 녹화본 준비

이 표는 14주차 중간 보고서와 15주차 release gate의 입력으로 사용된다.

실습 (Practicum)

팀 문제 정의

실제 반복 업무를 하나 고르고, 사용자를 한 문장으로 정의한다.
성공 기준 작성

“좋다” 대신 pass/fail 가능한 기준 5개를 만든다.
agent role 분해

Lead, Planner, Worker, Reviewer, Operator 중 필요한 역할만 선택한다.
runtime layer 매핑

각 계층에 어떤 파일, 스크립트, 로그, 정책이 들어갈지 정한다.
task packet schema 작성

JSON Schema 1개와 good/borderline/anti-pattern 예시 3개를 첨부한다.
ADR 3개 작성

모델 선택, 하네스 라이브러리, 평가 방식에 대해 각각 한 장씩 작성한다.
demo path 고정

최종 발표에서 보여줄 happy path 1개와 failure recovery path 1개를 정한다.

과제 (Assignment)

캡스톤: 아키텍처 설계 문서

제출 마감: 2026-06-02 23:59

제출 경로: capstone/teams/[팀명]/design.md

요구사항:

문제 정의 및 사용자/위험 경계
agent architecture diagram
Agent OS Runtime 7계층 중 최소 5계층 매핑
task packet schema와 예시 3개 (good/borderline/anti-pattern)
deterministic gate + LLM Judge + human review 기준
Risk register 5개 항목 (trigger/owner/response)
ADR 3개 이상
14-16주차 구현 계획과 demo script 초안

핵심 정리

Ralphthon은 “앱 만들기”가 아니다: closed-loop 시스템을 설계·운영·증명하는 자리다.
좋은 주제 5조건: 반복 가능 / 검증 가능 / 경계 명확 / 실패 복구 가능 / 팀 역할 분리.
L1–L7 core를 모두 만들 필요는 없다: 13주차에는 최소 5개 core 계층을 명시하고, 나머지는 책임자를 지정한다. L8은 선택 확장이다.
Won’t have가 진짜 계획: 무엇을 안 만들지 적지 못하면 그 계획은 희망목록이다.
Task packet은 schema로 강제: 자연어 지시 대신 JSON Schema validator를 통과한 packet만 worker에 전달한다.
ADR은 결정의 흔적: 무엇을 골랐는가가 아니라 무엇을 거절했는가가 중요하다.
Risk register는 14주차의 입구: 각 리스크는 trigger·owner·response 3종 세트로 정의되어야 다음 주에 추적된다.

더 읽을거리

핵심 문서

ADR / 설계 문화

Michael Nygard, “Documenting Architecture Decisions” (원문 ADR 패턴)
ThoughtWorks Tech Radar — ADR 항목

평가/리스크

STRIDE threat modeling 입문
Google SRE Workbook — Risk Analysis

예시 캡스톤 자료