Glossary

Key terms used in this course, arranged alphabetically.

autoresearch : An autonomous ML experimentation loop published by Andrej Karpathy. An agent modifies train.py, measures val_bpb after a fixed time budget (5 minutes), and commits if the metric improves or resets if it does not. It follows the same pattern as the Ralph Loop, but the validation condition is metric improvement rather than passing tests.

AI System : As defined in EU AI Act (Article 3): a machine-based system that infers outputs from inputs and operates with varying levels of autonomy. It wraps an AI model (the engine) with tool use, memory, planning, an execution environment, safety mechanisms, and observability. If the model is the engine, the AI system is the car.

Agentic System : An AI system that autonomously performs real actions such as file modification, code execution, and API calls. It goes beyond simple text generation to interact with its environment.

Bitter Lesson : A principle articulated by Rich Sutton (2019): general methods that leverage computation ultimately prove more effective than specialized methods that leverage domain knowledge. Pre-training scaling (GPT-3 → GPT-4) is the “learning” side; test-time compute scaling (o1, DeepSeek R1) is the “search” side.

Backpressure : In the Ralph Loop, the mechanism by which the system automatically rejects agent output that does not meet criteria and forces a retry. Compilers, type checkers, and test suites are representative backpressure components.

Context Rot : The degradation of reasoning quality in long-running agents as the context window fills with failed attempts and stale code.

Context Window : The maximum number of tokens an LLM can process in a single pass. Approximately 200K tokens for Claude Sonnet 4.6.

CUD Operations : Create, Update, Delete operations. Classified as High Risk in HOTL governance, requiring a Hard Interrupt for human approval.

DeepSeek V3 : DeepSeek’s 685B MoE model with 37B active parameters. Top-tier performance on math, reasoning, and coding. Requires an 8×H100-class cluster.

DGX H100 : NVIDIA’s enterprise-grade AI server. The model installed in the Cheju Halla University AI Lab. Equipped with 8 H100 GPUs.

Elicitation : In MCP, a reverse hook by which a server requests confirmation or input directly from the user through the client. It implements Human-in-the-Loop at the protocol level. Used when a server needs user consent before a sensitive operation or needs to collect additional information.

Garbage Collection : In the Ralph Loop, the process of completely removing incorrect code generated by the agent via git checkout . and restoring the repository to a clean state.

Governance-as-Code : The approach of implementing governance policies (who can do what) as software code that is automatically enforced.

Hard Interrupt : A forced-stop mechanism that requires explicit human approval before an agent can attempt a High Risk operation (CUD).

Harness : The deterministic external system that surrounds the Ralph Loop. Includes backpressure, garbage collection, and state tracking. Controls the non-deterministic output of the agent.

Harness Engineering : The engineering discipline of designing deterministic external systems to control non-deterministic AI agents. Its goal is to build a stronger harness rather than a stronger model.

HIC (Human-in-Command) : The highest governance layer of an AI system. Humans set overall strategy and boundary conditions; AI handles tactical execution.

HITL (Human-in-the-Loop) : An architecture in which human approval is required before every AI action. Safe but slow and low-scalability.

HOTL (Human-on-the-Loop) : An architecture in which the AI executes autonomously while humans monitor telemetry and intervene when necessary. High speed and scalability.

Instruction Tuning : The practice of correcting recurring agent errors by adding specific instructions to PROMPT.md, without retraining model weights.

LLM-as-Judge : A methodology that uses an LLM to automatically evaluate the output quality of another LLM, partially replacing human evaluators.

MCP (Model Context Protocol) : A standardized protocol developed by Anthropic for connecting agents to external tools such as the filesystem, Git, and databases.

McpInject : The core module of the SANDWORM_MODE attack. It installs a malicious MCP server, registers three tools under innocuous names, and embeds prompt injections in the tool descriptions to manipulate AI assistants into autonomously collecting .ssh/id_rsa, .aws/credentials, etc. A semantic attack that exploits the AI’s language understanding rather than memory vulnerabilities.

MIG (Multi-Instance GPU) : NVIDIA technology that partitions a GPU into independent instances. Up to 7 instances on an H100. Provides hardware-level isolation.

MiniMax M2.1 : MiniMax’s 230B MoE model (10B active). Specialized for coding agents and tool use. Fully open weights.

MoE (Mixture of Experts) : An LLM architecture that activates only a subset of total parameters during inference, improving efficiency. Used in Qwen3-Coder, DeepSeek V3, MiniMax M2.1, and others.

OBO (On-Behalf-Of) : An authentication pattern in which an MCP server operates using a delegated user/agent identity rather than a service account. Uses OAuth 2.1 token exchange to explicitly record “on whose behalf” an action is taken, resolving the accountability breakdown problem in agentic environments.

Qwen3-Coder : Alibaba’s 235B MoE coding-specialized model (22B active, 128K context). Performance approaching commercial models on SWE-bench. Apache 2.0 license.

GLM-4.7 : Zhipu AI’s ~32B Dense coding model. High reasoning quality via Interleaved Thinking. Can run on a single GPU. Available on HuggingFace/ModelScope.

SGLang : A leading open-source LLM inference framework alongside vLLM. Provides high throughput via RadixAttention-based KV cache reuse.

PagedAttention : The core technology of vLLM. Applies OS virtual memory paging to the KV cache, reducing memory waste to under 4%.

AI Coding CLI : The collective term for command-line tools that run AI agents in the terminal. Includes Claude Code, Gemini CLI, Codex CLI, and OpenCode. Can be automated in headless mode within a Ralph Loop.

RLM (Recursive Language Model) : A technique in which a model recursively calls itself to process long documents. The model loads a long prompt into a Python REPL variable, writes code to extract only the relevant parts, and calls itself recursively. Rather than expanding the context window, the model decides its own context navigation strategy.

Ralph Loop : An agentic development methodology popularized by Geoffrey Huntley in 2025. A simple persistent loop of the form while :; do cat PROMPT.md | <ai-coding-cli>; done. The <ai-coding-cli> can be claude, gemini, codex, or others.

/loop (Claude Code Loop) : Claude Code’s official schedule-based autonomous agent loop command. Runs as claude /loop "<instruction>" --every <interval> --for <duration>. Each iteration creates a git worktree for isolated execution and re-reads CLAUDE.md to reflect the latest context. Maximum 3-day expiry — an intentional design to prevent context drift in forgotten agents. A productization of the Ralph Loop’s general pattern, specific to Claude Code.

Ralphthon : An intensive development event in hackathon format centered on the Ralph Loop methodology. The name of this course’s capstone project.

Sampling : In MCP, a reverse hook by which a server requests inference from the client’s LLM. The server can leverage the host’s LLM intelligence without its own model API key. User approval is mandatory per the Human-on-the-Loop principle, and the client controls model selection and token limits.

SDLC (Software Development Lifecycle) : The software development lifecycle: requirements analysis → design → implementation → testing → deployment → maintenance.

Software 3.0 : A concept proposed by Andrej Karpathy in which programmers conduct (orchestrate) AI agents rather than writing code directly. The progression: Software 1.0 (manual coding), Software 2.0 (neural network training), Software 3.0 (agent orchestration).

TBAC (Task-Based Access Control) : A paradigm for controlling tool access based on the unit of an agent’s task purpose. Composed of three layers: Tasks → Tools → Transactions. Better suited than RBAC/ABAC’s “who” to agentic environments where “what task” is what matters. A variable substitution engine supports mcp.* and jwt.* namespaces for dynamic policy evaluation.

Test-Time Compute Scaling : A strategy for improving performance by investing more computation at inference time rather than increasing model size. Validated by OpenAI o1. The Ralph Loop, RLM, and autoresearch are all concrete realizations of this principle — calling the same model repeatedly while filtering results with deterministic validation.

Telemetry : The real-time collection of performance metrics, logs, and trace data from a running system.

Triple Gate Pattern : A three-layer defense architecture of AI Gateway → MCP Gateway → API Gateway. The first AI gateway filters prompt injections and PII; the second MCP gateway performs TBAC authorization; the third API gateway applies rate limiting. Each gate handles independent concerns, preventing a single point of failure.

vLLM : An open-source high-throughput LLM inference library. Maximizes GPU memory efficiency via the PagedAttention algorithm.