Skip to content

Week 6: Instruction Tuning

Phase 2Week 6IntermediateLecture: 2026-04-07

Why Instruction Tuning is Central in Week 6

Section titled “Why Instruction Tuning is Central in Week 6”

In Week 5 we learned how to keep context clean — auto-compaction, fresh context, state files. But even with clean context, if the agent keeps making the same mistakes, it’s all for nothing.

The core question this week: Can we correct agent behavior without changing model weights?

The answer is “instruction tuning” — correcting behavior by changing the environment (the instruction file) rather than the model. If Week 4’s Ralph loop AGENTS.md (cumulative learning) was passive record-keeping, this week’s PROMPT.md/CLAUDE.md tuning is active constraint design.

When we move to Week 7 (multi-agent), each agent will need its own PROMPT.md and permissions. Before that, we master the techniques for precisely controlling a single agent’s behavior.


When an agent makes recurring mistakes in the Ralph Loop, we correct its behavior by adding specific, deterministic instructions to PROMPT.md — without retraining model weights.

INSTRUCTION TUNING PROCESS
1. Identify recurring error patterns
2. Analyze the specific root cause of each error
3. Write a deterministic constraint statement
4. Add to the permanent section of PROMPT.md
5. Verify the effect in the next loop
# [Permanent Constraints — Never Ignore]
## ⚠️ Known Pitfalls (Instructional Tuning)
### 1. Do not call non-existent functions
- `utils.parse_json()` does not exist in this project
- Always use `json.loads()` directly
- Added: 2026-04-02 (recurring error at loop #47)
### 2. Do not commit without tests
- `pytest tests/ -q` must pass before any `git commit`
- CI will auto-rollback on failure
### 3. Type hints are mandatory
- All functions must include Python type hints
- `def add(a, b):``def add(a: int, b: int) -> int:`
---
# [Current Task]
...

Instruction File Design Principles — Lessons from 2,500 Repositories

Section titled “Instruction File Design Principles — Lessons from 2,500 Repositories”

GitHub Blog’s analysis of 2,500 open-source repositories (2026) and ETH Zurich’s AGENTbench research empirically identified the success factors and anti-patterns of instruction files.

5 Elements of Successful Instruction Files

Section titled “5 Elements of Successful Instruction Files”
  1. Executable commands first — Place specific commands with flags (e.g., pnpm test --coverage) at the top of the file
  2. Code examples over prosedef add(a: int, b: int) -> int: is more effective than “write in a type-safe manner”
  3. Clear boundaries — Explicitly state which files/directories must never be touched
  4. Specific stack versions — Not “use Python” but “Python 3.12, FastAPI 0.115, Pydantic v2”
  5. Cover 6 key areas — Build commands, tests, project structure, coding style, git workflow, prohibited actions
TierMeaningExamples
Always DoExecute every time without fail”Run pytest before committing”, “Type hints mandatory”
Ask FirstRequest confirmation before executing”Confirm with a human before DB schema changes”, “Confirm before modifying external API keys”
Never DoAbsolutely prohibited”Never commit .env files”, “Never push directly to main

The AGENTIF benchmark (Tsinghua, 2025) raises an additional concern: existing instruction-following research evaluates models with short synthetic instructions averaging 45 words, but real-world agentic task instructions run hundreds to thousands of words. A model that follows short benchmark instructions well is not guaranteed to follow long real-world instructions — keep this in mind when evaluating the effectiveness of a 200-line CLAUDE.md.

CLAUDE.md is injected into the system prompt every session and every turn. Therefore there is a token budget:

  • The system prompt already occupies ~50 instructions
  • User instruction budget: ~150–200 (roughly 200 lines)
  • Exceeding this budget causes the agent to start ignoring critical instructions

Pruning test: For each line, ask “Would Claude make a mistake without this?” If not, delete it.

Include: Build/test commands, non-standard coding conventions, architecture decisions, prohibited lists Exclude: Standard language rules (handled by linters), frequently changing information, long tutorials

Skills are the solution for staying under 200 lines while still leveraging domain knowledge:

StorageLoad TimePurpose
CLAUDE.mdAutomatic every sessionProject-wide rules (under 200 lines)
.claude/skills/*.mdOn-demand for relevant tasksDomain-specific knowledge

Example: API conventions, deployment procedures, DB migration rules don’t need to be loaded every time. Placing them in .claude/skills/ means they load only when needed — saving tokens while making specialized knowledge available.

SkillReducer research (arXiv:2603.29919) found a “less-is-more” effect: compressing tool descriptions by 48% actually improved quality by 2.8%. Reducing information helps the agent focus on what matters.


MetricMeasurement Method
Recurring error rateNumber of identical error occurrences / total loop count
Average loop countNumber of loops required to complete a task
Context efficiencyRatio of tokens wasted on unnecessary exploration

Output Styles and Cognitive Modes — Zero-Config Instruction Tuning

Section titled “Output Styles and Cognitive Modes — Zero-Config Instruction Tuning”

Adding constraints to PROMPT.md is like installing custom-made signs. In contrast, Claude Code’s Output Styles are pre-installed mode switches built into the agent. They change behavior patterns at zero configuration cost.

The key insight: output styles change the cognitive mode, not just the tone. Given the same task, the agent’s approach to the problem itself changes depending on the style.

Terminal window
claude --output-style explanatory "Improve error handling in this function"

The agent modifies code while explaining why it makes each change. Reasoning is already embedded in the PR, so reviewers only need to verify the decisions.

Best for: Team onboarding, junior engineers working in unfamiliar codebases, reducing code review turnaround

Boris Cherny’s team sets Explanatory as the default when junior engineers work in unfamiliar services. The result: PR review time dropped — reasoning arrives attached to the code, so reviewers no longer need to reconstruct the decision chain.

Effort Levels — Adjusting Reasoning Depth

Section titled “Effort Levels — Adjusting Reasoning Depth”

If output styles change the direction, effort levels adjust the depth.

LevelUse CaseCostExample
lowSimple lookups, type checksMinimalclaude --effort low "Return type of this function?"
mediumGeneral code changesModerateclaude --effort medium "Add tests"
highArchitecture design, complex debuggingHighclaude --effort high "Design async refactoring"
maxMaximum reasoning depth (Opus 4.6 exclusive)Maximum (10x+)claude --effort max "Security vulnerability analysis"

Practical effort level use in Ralph loops: start with low in early iterations (exploration phase), then switch to high for core implementation — this optimizes token cost across the loop.

Custom Output Style — Per-Project Cognitive Mode

Section titled “Custom Output Style — Per-Project Cognitive Mode”

When the built-in styles (Explanatory/Learning/Concise) are insufficient, you can create a custom style:

Terminal window
# Create a new custom style
claude /output-style:new

In the generated Markdown file’s frontmatter, keep-coding-instructions: true/false controls whether existing coding instructions are preserved or fully replaced. This is a deeper level of instruction tuning than PROMPT.md — it replaces the coding-related portion of the system prompt itself.

PROMPT.md Tuning vs Output Styles — When to Use Which

Section titled “PROMPT.md Tuning vs Output Styles — When to Use Which”
AspectPROMPT.md Instruction TuningOutput Style + Effort Level
Setup costHigh — error analysis → writing → verificationZero — a single CLI flag
CustomizationUnlimited — free-text project constraintsLimited — choose from preset list
PersistencePermanent — written to file, git-trackedPer-session — must specify each time
Sign metaphorCustom-made signs installed on siteFactory-installed mode switches
Best forProject-specific recurring error correctionGeneral behavior pattern changes

Hooks — Deterministic Enforcement Beyond CLAUDE.md

Section titled “Hooks — Deterministic Enforcement Beyond CLAUDE.md”

CLAUDE.md is advisory — it is followed roughly 80% of the time. For 100% enforcement, use Hooks.

Hooks are an automation mechanism that triggers shell commands / HTTP / LLM judgment on tool calls and session events. They are defined in ~/.claude/settings.json or the project’s .claude/settings.json.

TypeBehaviorUse Case
commandExecute a shell commandAuto-format, lint check, log recording
httpHTTP endpoint POSTExternal service notification, CI trigger
promptDelegate judgment to LLM (Haiku)Automatically judge “task completion”
agentSubagent reads files/runs commandsComplex verification like “did tests pass”

Real Example: Auto-Formatting with PostToolUse

Section titled “Real Example: Auto-Formatting with PostToolUse”
{
"hooks": {
"PostToolUse": [{
"matcher": "Write",
"command": "npx prettier --write $CLAUDE_FILE_PATH"
}]
}
}

This Hook runs Prettier automatically every time Claude writes a file. It is 100% reliable compared to writing “follow Prettier format” in CLAUDE.md.

Stop Hook for Automatic Ralph Loop Completion Verification

Section titled “Stop Hook for Automatic Ralph Loop Completion Verification”
{
"hooks": {
"Stop": [{
"type": "prompt",
"prompt": "Are all tasks complete? Check fix_plan.md for any incomplete items.",
"model": "haiku"
}]
}
}

The prompt type Hook delegates judgment to the Haiku model. If it returns "ok": false, the agent continues working. This is the key pattern for automating the manual verification step in the Ralph loop.


Instruction files are a common pattern across all AI coding tools, but the implementation differs per tool:

ToolFileHierarchyNotable Features
Claude CodeCLAUDE.md5-level + Skills@import, advisory (~80%), supplemented by Hooks
Cursor.cursor/rules/Directory-basedMigrated from .cursorrules, glob pattern matching
Windsurf.windsurf/rules/Directory-basedCascade engine auto-detects context
Codex CLIAGENTS.mdTool-neutral60,000+ repo adoption, 25+ agent compatible
GitHub Copilot.github/copilot-instructions.mdSingle + path-specificOrg-level GA (2026-04), excludeAgent support
Gemini CLIGEMINI.mdHierarchical discoveryAlso reads AGENTS.md, 1M context
JetBrains Junie.junie/guidelines.mdSingle fileIntelliJ platform integration

GitHub Copilot shipped organization-level custom instructions as GA in April 2026. Admins can set default instructions in a .github repo that apply across all org repositories. Additionally, .github/instructions/*.instructions.md supports path-specific instructions (YAML frontmatter + glob patterns), enabling different rules per file type.


  1. What is the difference between writing “Do not commit without tests” in CLAUDE.md versus using a Hook to force pytest execution via PreToolUse: Bash(git commit*)? Which is more effective, and why?
  2. Explain why LLM-generated instruction files are actually harmful according to the ETH Zurich research. “Adding an architecture overview seems helpful — why doesn’t it work?”
  3. At what point does Sign Fatigue occur? How many lines do you predict PROMPT.md must exceed before its effect reverses? Discuss in connection with the SkillReducer research.
  4. Output styles (Explanatory/Learning/Concise) are said to change the agent’s cognitive mode. What does “changing cognitive mode” mean technically? Which part of the system prompt is being modified?
  5. In Week 7 multi-agent systems, if each agent needs a different PROMPT.md, how would you design the separation of shared parts vs. role-specific parts?
  6. Cursor introduced /Generate Cursor Rules for AI-assisted rule auto-generation. How can this coexist with the ETH Zurich finding that “LLM-generated instructions are harmful”? Discuss the pros and cons of the “AI drafts, human curates” approach.

  1. Error Pattern Analysis

    Analyze execution logs from the previous lab using log_analyzer.py (see Lab 06) to extract the top 5 recurring errors.

  2. 3-Tier Boundary Design

    Classify the extracted errors into Always Do / Ask First / Never Do, and add a structured instruction section to PROMPT.md.

  3. Hook Configuration

    Implement one of the Never Do items as a PreToolUse Hook to guarantee 100% enforcement. Add it to settings.json.

  4. A/B Testing

    Run the same task before and after adding instructions, comparing loop counts, token usage, and recurring error rate. Use ab_test.py from Lab 06.

  5. Connect to Lab 06

    Use the experimental results above to complete the 4 requirements in Lab 06 (analysis report, PROMPT.md, comparison experiment, graph).

Submission deadline: 2026-04-14 23:59

Requirements:

  1. Recurring error analysis report (minimum 3 patterns, including 3-tier classification)
  2. Enhanced PROMPT.md (instruction section + Always Do / Never Do structure)
  3. Experimental results comparing before and after tuning (A/B test)
  4. Quantitative graph measuring instruction effectiveness

  1. Instruction tuning = installing signs + pruning: Includes not just adding but also pruning and reprioritizing to prevent Sign Fatigue
  2. Do not generate instruction files with LLMs: ETH Zurich research — 3% success rate decrease, 20% cost increase. Humans must write only “information the agent cannot infer by reasoning”
  3. CLAUDE.md under 200 lines: instruction budget ~150–200. Use the pruning test to remove unnecessary lines
  4. Separate with Skills: Domain knowledge goes in .claude/skills/ for on-demand loading. The less-is-more effect from SkillReducer
  5. Advisory vs Deterministic: CLAUDE.md (~80% compliance) + Hooks (100% enforcement). Choose the level based on importance
  6. 5-level hierarchy: Global → project → local → parent directory → child directory. The closest one takes priority
  7. Karpathy Guidelines: Surface assumptions → Simplicity first → Surgical changes → Goal-driven execution. A practical catalog of PROMPT.md instructions → Reference
  8. AGENTS.md is an AAIF standard — A Linux Foundation standard with 146+ member organizations. 60,000+ repo adoption. Claude Code does not yet support it natively, so dual-file strategy (AGENTS.md + CLAUDE.md) is needed.
  9. Triple-layer defense (defense-in-depth) — Advisory (CLAUDE.md, ~80%) + Deterministic (Hooks, 100%) + Sandboxing. A denylist bypass case (2026-03) empirically demonstrates the need for multi-layered defense.