Conceptual Perspective
Explain the difference between AI models and AI systems using the EU AI Act definition, and apply the “engine vs. automobile” analogy.
Conceptual Perspective
Explain the difference between AI models and AI systems using the EU AI Act definition, and apply the “engine vs. automobile” analogy.
Governance Perspective
Distinguish between the three stages of HITL → HOTL → HIC architecture using real-world examples.
Industry Perspective
Understand the landscape of the 2025–2026 agentic AI tools ecosystem and the key benchmarks.
Methodology Perspective
Grasp the core principles of harness engineering and survey the full arc of the 15-week course.
The AI industry reached a fundamental inflection point in 2025–2026. The era of using LLMs as simple text-generation tools has ended, and autonomous agentic systems have become mainstream in software development.
Before (2023–2024)
After (2025–2026)
The numbers make the scale of the transition clear:
“AI models” and “AI systems” are different things. Understanding this distinction is the first gate of this course.
EU AI Act (2024) definition: An AI system is “a machine-based system that operates with varying levels of autonomy and that, based on machine- or human-provided inputs, infers from its inputs how to generate outputs such as predictions, recommendations, decisions, or content” (Article 3).
NIST AI RMF 1.0 definition: “An engineered system that generates predictions, recommendations, or decisions for a given set of objectives.”
One analogy to keep in mind: a model is an engine, a system is an automobile. The engine (LLM) provides the power, but it alone cannot reach a destination. You need steering (planning), brakes (safety), navigation (memory), and a dashboard (observability) to make it an automobile.
The 7 components of an AI system:
| # | Component | Role | Analogy |
|---|---|---|---|
| 1 | Foundation Model | Reasoning engine | Engine |
| 2 | Tool Use / APIs | External action capability | Wheels and steering |
| 3 | Memory | Short-term context + long-term knowledge | Black box + GPS history |
| 4 | Planning / Reasoning | Task decomposition, goal ordering | Navigation |
| 5 | Execution Environment | Sandbox, containers | Roads and lanes |
| 6 | Safety / Guardrails | Policies, constraints, human oversight | Brakes and airbags |
| 7 | Observability | Logging, evaluation, feedback loops | Dashboard and dashcam |
This is why the EU AI Act regulates models and systems under separate rules. The safety of an engine alone (model regulation) and the safety of the whole automobile (system regulation) are evaluated on different criteria. No matter how good the engine, a car without brakes is dangerous.
Let’s establish the theoretical roots of this course.
Rich Sutton’s Bitter Lesson (2019): “General methods that leverage computation are ultimately the most effective, and by a large margin.”
This principle explains the two axes of AI progress:
The Ralph Loop and autoresearch we cover in Week 4 are external loop implementations of test-time compute. Instead of thinking deeper inside the model, the external harness repeatedly calls the model and verifies results. The principle is the same, but control belongs to us.
| HITL (Human-in-the-Loop) | HOTL (Human-on-the-Loop) | HIC (Human-in-Command) | |
|---|---|---|---|
| Human role | Sequential gatekeeper | Real-time monitoring, exception intervention | Setting strategy and boundary conditions |
| AI autonomy | Low — approval required at each step | Medium — autonomous execution, alerts on anomalies | High — tactical execution delegated |
| Speed | Low (human is the bottleneck) | High (parallel processing possible) | Maximum (asynchronous work possible) |
| Risk level | Lowest (all actions verified) | Medium (monitoring gaps possible) | Context-dependent |
| Regulatory requirement | EU AI Act high-risk baseline | Telemetry + audit logs mandatory | Documentation of boundary conditions required |
| Real-world example | Production DB migration | CI/CD pipeline, AI code generation | Enterprise AI strategy, this course’s Ralphthon |
| Analogy | Manual transmission | Cruise control | Self-driving level 4 |
The three architectures are not mutually exclusive. Within a single system, different levels apply depending on the risk level of each task. For example, with the same AI coding agent:
The EU AI Act (2024) has become the global standard for AI governance. Let’s cover the key provisions.
Article 14 — Human Oversight obligations:
Related international standards are also being rapidly established:
South Korea’s AI Framework Act (effective January 2026) is more innovation-friendly than the EU AI Act, but shares the principle of human oversight for high-risk AI. We’ll compare the implementation-level differences between the two laws in detail in Week 2.
Let’s look at the need for governance not through abstract principles, but through data.
METR Study (July 2025, arXiv 2507.09089): A randomized controlled experiment with 16 skilled developers completing 246 real-world tasks. The results were surprising.
The core cause of this phenomenon is the “Babysitting Tax.” The cost of reviewing, fixing, and debugging AI-generated code exceeded the cost of writing it directly. A 2025 CodeRabbit analysis also found that AI-generated PRs had ~1.7× the issue rate.
Anthropic’s 2026 report shows a balanced picture: developers use AI for 60% of their work, but fully delegated work (AI from start to finish) accounts for only 0–20%.
The AI coding tools market has grown to USD 34.58B as of 2026. It divides into three categories by architecture:
Terminal-native — suited for headless automation:
/loopAI-native IDEs — editor integration, visual context:
Cloud-native — remote execution, asynchronous tasks:
The top 3 tools (GitHub Copilot, Claude Code, Cursor) hold 70%+ market share, and MCP (Model Context Protocol) has emerged as the common protocol for tool integration: 97M+ monthly SDK downloads, 6,400+ registered servers.
Knowing the benchmarks for actual agent capabilities helps you distinguish real performance from inflated marketing claims.
SWE-bench Verified — benchmark for autonomously resolving real GitHub issues:
Real-world data is also accumulating:
A notable 2026 development is that open-source models have approached commercial-level quality. Of the 204 AI coding tools currently tracked, 95% are open-source.
Key open-source coding models:
Why we deploy open-source models in this course: cost, privacy (campus data stays local), and customizability (tuned to our learning environment). In Weeks 10–11 we deploy these models on the DGX H100 server.
The way we work with AI is rapidly evolving:
| Period | Paradigm | Core Skill |
|---|---|---|
| 2023 | Single prompt | Prompt engineering |
| 2024 | RAG pipelines | Retrieval-augmented generation, vector DBs |
| 2025 | Agent systems | Tool use, multi-agent |
| 2026 | Harness engineering | Loops, governance, observability |
Andrej Karpathy’s “Software 3.0” vision summarizes this shift: programmers move from directly writing code to becoming conductors of AI agents. Instead of playing instruments yourself, you lead the orchestra.
The required skills are also shifting:
Geoffrey Huntley’s core insight: “Build a stronger harness, not a stronger model.”
The Ralph Loop prototype is surprisingly simple:
while :; do cat PROMPT.md | <ai-coding-cli>; doneThis one line works because of two mechanisms:
git checkout . completely removes failed attempts. The context is not contaminated by the residue of failures.OpenAI applied the same principle at scale. In a case study where Codex agents wrote 1M+ LOC without manual typing, the key was the harness, not the model: specs-as-code to enforce specifications, automatic layer architecture validation, and GC to restore a clean state on failure.
| Phase | Weeks | Topic | Harness Layer |
|---|---|---|---|
| Phase 1 | 1–3 | Governance, infrastructure, protocols | Safety layer — brakes and airbags |
| Phase 2 | 4–6 | Loops, context management, instruction tuning | Control loop — engine and transmission |
| Phase 3 | 7–9 | Role assignment, planner, QA | Orchestration — navigation and autonomy |
| Phase 4 | 10–12 | Model deployment, evaluation | Infrastructure and observability — dashboard and maintenance |
| Phase 5 | 13–16 | Ralphthon capstone | Real-world validation — the driving test |
In Phase 1 we build the safety systems first, then in Phase 2 we add the engine. Order matters — accelerating without brakes is an accident.
After 15 weeks, you will have fully implemented the following system:
Install Node.js 20 LTS
# macOS (Homebrew)brew install node@20
# Ubuntu/Debiancurl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -sudo apt-get install -y nodejsInstall AI coding CLI tools → Set API keys → Test run
# 1. Install (choose one)brew install claude-code # Homebrew (recommended)pnpm add -g @anthropic-ai/claude-code # pnpm
# 2. Set API key (add to ~/.bashrc or ~/.zshrc)export ANTHROPIC_API_KEY="sk-ant-..."
# 3. Test runmkdir ~/test-project && cd ~/test-projectclaude "Hello! Please create a simple Python hello world file in this directory."# 1. Installpnpm add -g @google/gemini-cli
# 2. Set API key (or authenticate via browser on first run)export GEMINI_API_KEY="..."
# 3. Test runmkdir ~/test-project && cd ~/test-projectgemini# In interactive mode: "Please create a simple Python hello world file in this directory."# 1. Installpnpm add -g @openai/codex
# 2. Set API keyexport OPENAI_API_KEY="sk-..."
# 3. Test runmkdir ~/test-project && cd ~/test-projectcodex "Please create a simple Python hello world file in this directory."# 1. Installbrew install opencode
# 2. Set API key (based on which model provider you use)export OPENAI_API_KEY="sk-..." # when using OpenAI models
# 3. Test runmkdir ~/test-project && cd ~/test-projectopencode# In TUI: "Please create a simple Python hello world file in this directory."Observation exercise: Record AI autonomous decisions
Open the code the AI generated and observe the following:
This observation is the starting point of harness engineering — how do we control the non-deterministic output of AI in a deterministic way?
The DGX server is protected by Cloudflare Zero Trust. You must install the WARP client and log in before connecting.
Install Cloudflare WARP
Download the client for your operating system from the Cloudflare WARP download page.
Log in to Zero Trust
Open WARP → click the settings gear → Preferences → Account → Login to Cloudflare Zero Trust → enter the team name → log in with your school email
SSH connection
With WARP connected, open a terminal and connect.
ssh {USER}@{SERVER_IP} -p {PORT}# Initial password is your student ID — change it on first login!Access https://jupyter.chu.ac.kr in your browser and log in with your student ID
# After installing the VS Code Remote SSH extension# Ctrl+Shift+P → Remote-SSH: Connect to Host# → {SERVER_IP}:{PORT}Due: 2026-03-10 23:59
Submission path: assignments/week-01/[student-ID]/ via PR
Requirements:
hello_agent.py — a simple Python file generated with an AI coding CLIREADME.md — document any problems encountered during setup and how you solved themBonus:
Grading criteria:
Week 2 covers the concrete implementation of HOTL governance and EU AI Act compliance requirements. In particular, we practice “Governance-as-Code” — how to enforce governance policies through code.