Week 5: Context Management and Preventing Context Rot
Theory
Section titled “Theory”Why Context Management is Central in Week 5
Section titled “Why Context Management is Central in Week 5”In Week 4 we learned the power of the loop paradigm — calling the same model repeatedly while using deterministic validation to ensure quality. But for a loop to work, there is a prerequisite: the context must be clean.
Week 4’s Huntley knew about this problem. That’s why he chose a “fresh context” strategy — starting a new session on every loop iteration. But that alone isn’t enough:
- Even inside a loop, context gets polluted — tool call results, failed attempts, and error messages accumulate
- State must be passed between sessions — restarting from scratch every time loses the learning from the previous loop
- The prerequisite for Week 6’s instruction tuning: even if you add constraints in CLAUDE.md, a polluted context will cause the agent to “forget” those constraints
The central question for this week: how do we manage context deterministically?
To answer that question, we’ll first confirm with empirical data why Context Rot happens and how serious it is, then build up solutions from there.
What is Context Rot?
Section titled “What is Context Rot?”Context Rot is the phenomenon where an agent’s context window becomes increasingly polluted over a long session:
Empirical Data: Chroma’s Context Rot Research
Section titled “Empirical Data: Chroma’s Context Rot Research”“Longer context gets worse” is intuitive, but how much worse was first quantified by Chroma’s 2026 empirical study. Testing 18 frontier models (Claude, GPT, Gemini, Llama, and others):
| Finding | Data |
|---|---|
| Accuracy drop at mid-window position | 30%+ |
| Correlation between input length and accuracy | Negative across all models — no exceptions |
| Counter-intuitive result | Shuffled documents scored higher accuracy than logically ordered documents |
The last finding is especially important. When documents are arranged in logical order, models tend to judge “I already saw this earlier, I can skim the rest.” Shuffling forces attention at every position, which actually raises accuracy.
Why It Happens — Three Compounding Mechanisms
Section titled “Why It Happens — Three Compounding Mechanisms”The Chroma study established how much accuracy degrades. Follow-up research identified why — three mechanisms that amplify each other:
| Mechanism | Description | Scaling |
|---|---|---|
| Lost-in-the-middle | Information in mid-context positions is retrieved less accurately than at the start or end | Position-dependent |
| Attention dilution | As context grows, each token’s share of attention decreases | Quadratic — doubling length quadruples dilution |
| Distractor interference | Irrelevant information actively degrades reasoning about relevant information | Proportional to noise ratio |
MECW — Maximum Effective Context Window
Section titled “MECW — Maximum Effective Context Window”An important concept from this research: MECW (Maximum Effective Context Window) — the point at which a model’s accuracy on in-context information drops below a usable threshold. MECW ≠ the advertised maximum context. In benchmarks, top models begin to fail well before their nominal limits. This is why effective usage in the table below is 60-70%, not 100%.
The 1M Token Era — Is a Larger Window the Solution?
Section titled “The 1M Token Era — Is a Larger Window the Solution?”Frontier model context windows as of 2026:
| Model | Official Context | Effective Usage |
|---|---|---|
| Claude Opus/Sonnet 4.6 | 1M tokens | ~600-700K |
| GPT-5.4 | 1M tokens | ~600-700K |
| Gemini 2.5 Pro | 1M tokens | ~600-700K |
The reason effective usage is 60-70%: the remainder is consumed by the system prompt (~50K), tool schemas (~30K), and safety margin (~200K).
Compaction Strategy — What to Discard, What to Keep
Section titled “Compaction Strategy — What to Discard, What to Keep”When auto-compaction fires, preservation priority is decisive:
| Priority | What to Keep | Why |
|---|---|---|
| 1 (highest) | System prompt + CLAUDE.md | The agent’s “constitution” — losing this erases behavioral rules |
| 2 | Last 4 messages | Immediate context of the current task |
| 3 | Tool results for the current task | The file just read, the test just run |
| 4 (lowest) | Old conversation + previous tool results | Can be replaced with a summary |
When NOT to compress: In the following situations, it is better to end the session and start fresh rather than compact:
- The conversation topic has completely shifted (previous context is a hindrance)
- The same error has repeated 5+ times in a row (Context Rot is already severe)
- The summary itself is 50%+ the size of the original (no compression benefit)
This is why Huntley in Week 4 chose “fresh context” — between loop iterations, full reset + state file handoff is more deterministic than compaction.
Beyond Heuristics — The ACON Compression Framework
Section titled “Beyond Heuristics — The ACON Compression Framework”Current compaction strategies (including Claude Code’s) are heuristic-based — rules like “keep last 4 messages” and “summarize the rest.” ACON (Agent Context Optimization, arXiv:2510.00615, October 2025) proposes treating compression as a formal optimization problem:
| Metric | Result |
|---|---|
| Peak token reduction | 26-54% |
| Accuracy preservation | 95%+ |
| Method | Gradient-free, API-compatible (works with any model via API) |
ACON’s key insight: instead of fixed rules, dynamically score each context segment’s contribution to the current task and prune those below a threshold. This is the difference between “always keep the last 4 messages” (heuristic) and “keep the messages most relevant to the current task” (optimization).
Claude Code currently uses heuristics, but ACON represents the direction the field is moving — from rule-based to optimization-based compaction.
The Ralph Loop Solution: Context Window Wiping
Section titled “The Ralph Loop Solution: Context Window Wiping”One of the key innovations of the Ralph Loop is completely resetting the context after a task completes or fails:
class RalphContextManager: def __init__(self, max_tokens: int = 200_000): self.max_tokens = max_tokens self.state_file = "claude-progress.txt"
def should_wipe_context(self, current_tokens: int) -> bool: """Reset context when more than 75% of the window is used""" return current_tokens > self.max_tokens * 0.75
def build_fresh_context(self) -> str: """Deterministically reconstruct context from the state file""" state = self.load_state() return f"""# Project State{state['completed_tasks']}
# Current Task{state['current_task']}
# Relevant Code (current version only){state['relevant_code_snippet']}"""
def save_state(self, task: str, status: str): """Save state for the next loop iteration""" with open(self.state_file, 'a') as f: f.write(f"[{status}] {task}\n")State Tracking File Design Patterns
Section titled “State Tracking File Design Patterns”fix_plan.md Template:
# Project: Calculator App## Completed Tasks- [x] Create basic file structure (2026-03-31 14:23)- [x] Implement add() function and pass tests (2026-03-31 14:45)
## Current Task- [ ] Implement subtract() function - Expected file: calculator.py:15-25 - Related tests: tests/test_calculator.py:20-35
## Pending Tasks- [ ] multiply() function- [ ] divide() function (must handle division-by-zero exception)Token Economics — Cutting 40-70% Waste
Section titled “Token Economics — Cutting 40-70% Waste”The ultimate goal of context management is to do more useful work on the same budget. Empirical data shows that 40-70% of agent input tokens are wasted — duplicate tool results, unnecessary file contents, bloated system prompts.
Model Routing — You Don’t Need Opus for Everything
Section titled “Model Routing — You Don’t Need Opus for Everything”| Task Type | Share | Recommended Model | Cost (1M tokens) |
|---|---|---|---|
| Simple lookups, formatting, type checking | 60-70% | Haiku | $1 / $5 |
| Standard coding, bug fixes, feature additions | 25-30% | Sonnet | $3 / $15 |
| Architecture design, complex debugging | 5-10% | Opus | $5 / $25 |
Model routing alone enables 5-8x cost reduction. Claude Code’s effort parameter (see Week 4) is the productized form of this routing.
Prompt Caching — Turning Repetition into an Asset
Section titled “Prompt Caching — Turning Repetition into an Asset”On every agent turn, the system prompt, tool schemas, and CLAUDE.md contain the same content repeated. Prompt caching stores this static portion and reuses it:
| Operation | Price (vs. baseline) |
|---|---|
| Cache write (5-min TTL) | 1.25x |
| Cache write (1-hour TTL) | 2x |
| Cache read | 0.1x (90% savings) |
Implications for the loop paradigm:
- Continuous session: Create cache on first turn → read at 0.1x on subsequent turns = very economical
- Ralph fresh context: New session every loop → cache must be recreated = higher cost
- Trade-off: Context Rot prevention (fresh) vs. cache efficiency (continuous). Same problem as Week 4’s Huntley Showdown
The Initializer Pattern — 2-Phase State Management
Section titled “The Initializer Pattern — 2-Phase State Management”The 2-phase pattern recommended in Anthropic’s official harness guide systematizes the state file design above:
Phase 1 — Initializer (first loop):
- Parse requirements and generate a feature list as JSON
- Initialize
claude-progress.txt - Generate
init.sh(environment setup script)
{ "features": [ {"id": "F001", "name": "User Authentication", "status": "pending", "files": ["src/auth.py"]}, {"id": "F002", "name": "Dashboard UI", "status": "pending", "files": ["src/dashboard.py"]} ], "constraints": ["pytest must pass", "100% type hints"]}Phase 2 — Coding Agent (subsequent loops):
- Run
init.shto configure the environment - Pull a
"status": "pending"item from the JSON and work on it - On completion: set
"status": "done"+ record inclaude-progress.txt - The next loop reads the JSON and picks up from remaining items
This pattern is a higher-level abstraction of the three state files in today’s Week 5 (claude-progress.txt, fix_plan.md, @codebase_map.md). The JSON feature list replaces fix_plan.md; init.sh replaces @codebase_map.md.
Persistent Knowledge Systems — The LLM Wiki Pattern
Section titled “Persistent Knowledge Systems — The LLM Wiki Pattern”While the Initializer pattern manages state for a single task, the LLM Wiki pattern accumulates knowledge across an entire project permanently. Proposed by Karpathy, its core principle is “compile knowledge once, keep it current.”
Architecture:
| Layer | Role | Example |
|---|---|---|
| Raw Sources | Immutable original materials | Paper PDFs, meeting recordings, codebases |
| Wiki | LLM-generated and maintained markdown files | Summaries, cross-references, entity pages |
| Schema | Configuration file defining structure and workflows | Wiki rules described in CLAUDE.md |
Three Operations:
- Ingest — Read new source material, write summaries, update related wiki pages
- Query — Search the wiki to synthesize answers, file new discoveries back into the wiki
- Lint — Check for contradictions, stale information, orphaned pages, missing cross-references
This pattern is the evolutionary next step beyond the Initializer: 3 state files (single task) → JSON feature list (multi-feature project) → Wiki (persistent knowledge graph). The ultimate form of context management is not re-understanding everything each session, but referencing accumulated knowledge.
Knowledge Graph-Based Token Reduction — Graphify
Section titled “Knowledge Graph-Based Token Reduction — Graphify”Instead of putting raw files directly into context, transforming a codebase into a knowledge graph and querying only the needed parts can dramatically reduce token consumption. Graphify is a practical implementation of this approach.
How it works:
- Code analysis: tree-sitter AST extraction parses classes, functions, imports, and call graphs locally (no LLM calls needed, code never leaves your machine)
- Document/media analysis: Semantic extraction from PDFs, images, videos via LLM
- Graph construction: NetworkX + Leiden clustering for community detection, interactive HTML visualization + JSON output
- Confidence tagging: Relationships classified as EXTRACTED (confirmed), INFERRED (with confidence scores), or AMBIGUOUS
Token efficiency: 71.5x token reduction compared to reading raw files directly on mixed corpora. Once the graph is built, only incremental updates via SHA256 caching are needed.
This approach can be combined with the LLM Wiki pattern: use Graphify to graph the codebase structure, then layer insights and decisions on top with an LLM Wiki.
LazyGraphRAG — Lightweight Graph Retrieval at Scale
Section titled “LazyGraphRAG — Lightweight Graph Retrieval at Scale”While Graphify graphs a single codebase locally, Microsoft’s LazyGraphRAG applies graph-based retrieval to enterprise-scale document collections:
| Metric | LazyGraphRAG vs Full GraphRAG |
|---|---|
| Indexing cost | 0.1% (1/1000) |
| Query cost | 4% (1/25) |
| Quality | Outperforms vector RAG, RAPTOR, and competing methods |
LazyGraphRAG builds graphs via lazy evaluation without pre-summarization, achieving retrieval quality superior even to 1M-token context windows. It is deployed in production on Microsoft Discovery (Azure).
Graphify (local codebase, individual use) vs LazyGraphRAG (massive document sets, enterprise) — two points on the cost-quality spectrum for knowledge graph approaches.
Context Strategy Comparison by Tool
Section titled “Context Strategy Comparison by Tool”Context management approaches differ by tool. As of 2026, three strategies are competing:
| Strategy | Representative Tool | Approach | Pros | Cons |
|---|---|---|---|---|
| Explicit | Cursor | User manually selects which files go into context | Precise control, minimal token waste | Manual labor, risk of omission |
| Ambient | Windsurf (Cascade) | Tool automatically detects relevant files | Convenient, prevents omissions | Risk of over-inclusion, token waste |
| Hybrid | Claude Code | File-based persistence (CLAUDE.md) + auto-compaction | Balanced, loop-friendly | Requires setup, learning curve |
| Tiered | GitHub Copilot | 3-tier memory (user/repo/session) + Agent Mode | Familiar VS Code UX, 28-day auto-memory | VS Code-centric, limited CLI control |
| Sandboxed | Codex CLI | Container isolation + AGENTS.md | Safest execution, tool-neutral | No MCP support, 192K context limit |
| Large-window | Gemini CLI | 1M context + GEMINI.md | Free tier, largest context | Newer ecosystem, less tooling |
| Autonomous | Devin | Full autonomous sessions | End-to-end automation | Degrades after ~35 min / 10 ACUs |
VS Code Copilot introduced 3-tier memory (user/repository/session) in 2026, separating user preferences (global) → project rules (repo) → current conversation (session). This is the same design principle as Claude Code’s 3-level CLAUDE.md hierarchy (global/project/local).
Instruction File Convergence — The Industry Validates File-Based Context
Section titled “Instruction File Convergence — The Industry Validates File-Based Context”A remarkable trend in 2025-2026: every major AI coding tool independently converged on file-based context management. They compete on nearly everything else, but arrived at the same architecture for context management:
| Tool | Instruction File | Hierarchy |
|---|---|---|
| Claude Code | CLAUDE.md | 3-level: ~/.claude/ (global) → project/.claude/ (project) → workspace/ (local) |
| OpenAI Codex CLI | AGENTS.md | Hierarchical directory tree discovery (closest file wins) |
| Google Gemini CLI | GEMINI.md | Project root |
| Cursor | .cursor/rules/*.mdc | Glob-pattern scoped rules |
| GitHub Copilot | .github/copilot-instructions.md | Single file + 28-day auto-memory |
| Windsurf | Cascade Memories | Auto-generated from usage patterns |
Token-Efficient Data Serialization
Section titled “Token-Efficient Data Serialization”When agents send structured data to LLMs, the serialization format can cause 2–3x differences in token consumption. Results from serializing the same 50-user list in 7 formats:
| Format | Tokens | LLM Accuracy | Best For |
|---|---|---|---|
| CSV | ~800 | 44.3% | Pure tables (accuracy risk) |
| Markdown-KV | ~950 | 60.7% | Simple key-value retrieval |
| TOON | 993 | 76.4% | Uniform array data |
| JSON (compact) | ~1,100 | 73.7% | General purpose — safest balance |
| JSON (pretty) | 1,481 | 75.0% | When human readability is needed |
| YAML | 1,710 | 74.5% | Nested configs, prompt structuring |
| XML | 2,690 | 72.1% | Legacy system integration |
TOON — A Case Study in Token-Optimized Serialization
Section titled “TOON — A Case Study in Token-Optimized Serialization”TOON (Token Oriented Object Notation, 2025) preserves JSON’s structure while removing quotes, braces, and commas, representing uniform arrays as CSV-style tables. With 23.7K GitHub stars and 1.6M monthly npm downloads, community interest is real.
// TOON example: uniform array → header + row formatusers[3]{id,name,email}: 1,Alice,alice@example.com 2,Bob,bob@example.com 3,Carol,carol@example.comStrengths: 40–60% token reduction on uniform arrays vs pretty JSON. Limitations: Can be 15–20% larger than compact JSON for non-uniform/nested data. The spec is a Working Draft (v3.0 reached in just 3 weeks from v0.8), and LLMs haven’t been trained on TOON, requiring format explanation in prompts.
Practical Recommendations
Section titled “Practical Recommendations”- Default: Compact JSON — all LLMs understand it, mature ecosystem. 25–40% savings vs pretty JSON.
- Large uniform tables (100+ rows): Consider TOON or CSV — but always measure the accuracy trade-off.
- API-based structured output: Function calling / structured output APIs are optimal (Microsoft measured: 42% savings vs JSON).
- Week 7 multi-agent: Design inter-agent artifacts with compact JSON. See Week 7 Artifact Handoff.
Discussion Questions
Section titled “Discussion Questions”- Does a 1M token context solve Context Rot? Answer using evidence from the Chroma research data.
- Ralph’s fresh context vs. continuous session — when you factor in caching costs (fresh = recreate every time, continuous = read at 0.1x), which is more economical? Under what conditions does this flip?
- What is the basis for the advice to keep CLAUDE.md under 200 lines? Connect your answer to SkillReducer’s “less-is-more” effect.
- Reason why shuffled documents scored higher accuracy than ordered documents in the Chroma study. What does this imply for context design?
- Why does the Initializer pattern store the feature list as JSON? What problem arises if you use Markdown instead?
- MECW (Maximum Effective Context Window) is always less than the advertised maximum. If 65% of enterprise AI failures stem from context drift during multi-step reasoning, how should you design agent sessions? Discuss in connection with the “sprint” pattern (short 5-10 message interactions with fresh context).
Practicum
Section titled “Practicum”-
Measure Token Usage
Run the
/costcommand in a Claude Code session to check the current session’s token usage. After 10 turns of conversation, measure again and record the increase.import anthropicdef count_tokens(messages: list) -> int:client = anthropic.Anthropic()response = client.messages.count_tokens(model="claude-opus-4-6",messages=messages)return response.input_tokens -
Before/After Compaction Comparison
In a session with 20+ turns of conversation, run the
/compactcommand. Record and compare token counts, response quality, and task continuity before and after compression. -
Build a State File System
Write helper functions that automatically update
fix_plan.mdandclaude-progress.txt. Refer tostate_tracker.pyin Lab 05. -
Connect to Lab 05
Based on the experiment results above, implement the four modules in Lab 05:
token_counter.py,context_manager.py,state_tracker.py, andmain.py.
Assignment
Section titled “Assignment”Lab 05: Context Management Practice
Section titled “Lab 05: Context Management Practice”Submission deadline: 2026-04-07 23:59
Requirements:
- Ralph Loop with integrated token counter (
ralph_with_context.sh) - Context Rot simulation and a graph of measurement results
- Automatic context reset logic implementation
- Demonstration that the state tracking system (
fix_plan.md+claude-progress.txt) works correctly
Key Summary
Section titled “Key Summary”- Context Rot is an empirical phenomenon — Chroma study: accuracy degrades as input grows longer across all 18 models. 30%+ degradation at mid-window positions.
- A 1M token window is not the solution — a bigger window just means a larger space for Context Rot to occur in. Effective usage is 60-70%.
- Compaction is token-based — auto-triggered at ~75% of model maximum. Preserves the last 4 messages, summarizes the rest.
- Ralph’s fresh context is best between loops — use compaction inside a loop, full reset + state file handoff between loops.
- 40-70% of tokens are wasted — cut costs with model routing (Haiku 60-70%), prompt caching (read at 0.1x), and keeping CLAUDE.md under 200 lines.
- Initializer pattern — manage state deterministically with a 2-phase structure: JSON feature list + claude-progress.txt + init.sh.
- LLM Wiki pattern — beyond per-session state files, an architecture where LLMs permanently maintain a knowledge wiki. Knowledge accumulates through Ingest → Query → Lint cycles.
- Knowledge graph token reduction — tools like Graphify that transform codebases into graphs achieve 71.5x token savings vs raw file reading. LazyGraphRAG achieves similar quality at 0.1% of full GraphRAG cost for enterprise scale.
- Instruction file convergence — All major coding tools (Claude Code, Codex CLI, Gemini CLI, Cursor, Copilot, Windsurf) independently converged on file-based context persistence. Configuration files are the universal answer to cross-session context management.
- MECW < advertised maximum — Maximum Effective Context Window is always less than the nominal limit. 65% of enterprise failures stem from context drift, not context exhaustion. Design for short, focused sessions.
Further Reading
Section titled “Further Reading”- Chroma: Context Window Research — Empirical Context Rot study across 18 models
- Anthropic: Prompt Caching Documentation — Cache pricing, TTL, and usage
- SkillReducer (arXiv:2603.29919) — The less-is-more effect: compressing tool descriptions improves quality
- Simon Willison: YAML vs Token-Saving Formats — 9,649 experiments showing familiar formats outperform
- Karpathy: LLM Wiki — Building Persistent Knowledge Systems — Pattern for LLMs maintaining persistent knowledge wikis
- Graphify — Transform codebases/documents into knowledge graphs, tree-sitter AST + Leiden clustering, 71.5x token savings
- Chroma Context-1 — 20B MoE agentic search model, 0.94 pruning accuracy, self-editing context
- ACON: Agent Context Optimization (arXiv:2510.00615) — Formal optimization approach to context compression, 26-54% token reduction with 95%+ accuracy
- LazyGraphRAG — Microsoft Research — 0.1% of GraphRAG indexing cost with comparable quality, deployed on Azure Discovery
- OpenAI Codex CLI — AGENTS.md — Tool-neutral hierarchical instruction file standard
- A2A Protocol (Google, 2025) — Agent-to-agent communication protocol, complementary to MCP (detailed in Week 7)