Karpathy Guidelines

Behavioral guidelines derived from Andrej Karpathy’s observations on LLM coding pitfalls. These classify the mistakes LLM coding agents repeatedly make into four principles, with correction methods and code examples for each.

1. Think Before Coding

Don’t assume. Don’t hide confusion. Surface tradeoffs.

2. Simplicity First

Minimum code that solves the problem. Nothing speculative.

3. Surgical Changes

Touch only what you must. No “improving” adjacent code.

4. Goal-Driven Execution

Define success criteria. Loop until verified.

Principle 1: Think Before Coding

Before implementing, state your assumptions explicitly. If uncertain, ask. If multiple interpretations exist, present them — don’t pick silently. If a simpler approach exists, say so.

Example 1: Hidden Assumptions

Request: “Add a feature to export user data”

Common Mistake
Correct Approach

def export_users(format='json'):
    """Export all users to a file."""
    users = User.query.all()

    if format == 'json':
        with open('users.json', 'w') as f:
            json.dump([u.to_dict() for u in users], f)
    elif format == 'csv':
        with open('users.csv', 'w') as f:
            writer = csv.DictWriter(f, fieldnames=['id', 'email', 'name'])
            writer.writeheader()
            writer.writerows([u.to_dict() for u in users])

    return f"Exported {len(users)} users"

Problems:

Assumed exporting ALL users (what about pagination? privacy?)
Assumed file location without asking
Assumed which fields to include
Hardcoded CSV fieldnames without checking actual data structure

Before implementing, I need to clarify:

1. **Scope**: Export all users or filtered subset? (privacy implications)
2. **Format**: You mentioned "export" — did you mean:
   - Download file in browser?
   - Background job with email notification?
   - API endpoint returning data?
3. **Fields**: Which user fields? (some might be sensitive)
4. **Volume**: How many users typically? (affects approach)

Simplest approach: Add an API endpoint that returns paginated JSON.
Would need more info for file-based exports.

What's your preference?

Example 2: Multiple Interpretations

Request: “Make the search faster”

Common Mistake
Correct Approach

# Adds caching, database indexes, and async processing
# without asking which "faster" matters
@lru_cache(maxsize=1000)
async def search(query: str) -> List[Result]:
    # ... 200 lines of optimization code

"Make the search faster" could mean:

1. **Faster response time** (< 100ms instead of 500ms)
   - Add database indexes, cache frequent queries
   - Estimated effort: 2 hours

2. **Handle more concurrent searches** (throughput)
   - Add async/await, connection pooling
   - Estimated effort: 4 hours

3. **Faster perceived speed** (UX)
   - Show partial results immediately, progressive loading
   - Estimated effort: 3 hours

Current search takes ~500ms for typical queries.
Which aspect matters most for your use case?

Principle 2: Simplicity First

Minimum code that solves the problem. No features beyond what was asked. No abstractions for single-use code. No error handling for impossible scenarios. If you write 200 lines and it could be 50, rewrite it.

Example 1: Over-abstraction

Request: “Add a function to calculate discount”

Common Mistake
Correct Approach

from abc import ABC, abstractmethod
from dataclasses import dataclass

class DiscountStrategy(ABC):
    @abstractmethod
    def calculate(self, amount: float) -> float:
        pass

class PercentageDiscount(DiscountStrategy):
    def __init__(self, percentage: float):
        self.percentage = percentage

    def calculate(self, amount: float) -> float:
        return amount * (self.percentage / 100)

class FixedDiscount(DiscountStrategy):
    def __init__(self, fixed_amount: float):
        self.fixed_amount = fixed_amount

    def calculate(self, amount: float) -> float:
        return min(self.fixed_amount, amount)

@dataclass
class DiscountConfig:
    strategy: DiscountStrategy
    min_purchase: float = 0.0
    max_discount: float = float('inf')

class DiscountCalculator:
    def __init__(self, config: DiscountConfig):
        self.config = config

    def apply_discount(self, amount: float) -> float:
        if amount < self.config.min_purchase:
            return 0
        discount = self.config.strategy.calculate(amount)
        return min(discount, self.config.max_discount)

# Usage requires 30+ lines of setup for a simple calculation

Problem: Strategy pattern, Config dataclass, ABC — all for a single discount calculation.

def calculate_discount(amount: float, percent: float) -> float:
    """Calculate discount amount. percent should be 0-100."""
    return amount * (percent / 100)

# Usage
discount = calculate_discount(100.0, 10.0)  # $10 off

When to add complexity: Only when you actually need multiple discount types. Refactor then.

Example 2: Speculative Features

Request: “Save user preferences to database”

Common Mistake
Correct Approach

class PreferenceManager:
    def __init__(self, db, cache=None, validator=None):
        self.db = db
        self.cache = cache or InMemoryCache()
        self.validator = validator or DefaultValidator()

    def save(self, user_id: int, prefs: dict,
             merge: bool = True,
             validate: bool = True,
             notify: bool = False) -> bool:
        if validate:
            errors = self.validator.validate(prefs)
            if errors:
                raise ValidationError(errors)

        existing = self.get(user_id) if merge else {}
        if merge:
            existing.update(prefs)
            prefs = existing

        self.db.execute(
            "UPDATE users SET preferences = ? WHERE id = ?",
            (json.dumps(prefs), user_id)
        )

        if self.cache:
            self.cache.set(f"prefs:{user_id}", prefs)

        if notify:
            self.notify_preference_change(user_id, prefs)

        return True

Problem: Caching, validation, merging, notifications — all unrequested features.

def save_preferences(db, user_id: int, preferences: dict):
    """Save user preferences to database."""
    db.execute(
        "UPDATE users SET preferences = ? WHERE id = ?",
        (json.dumps(preferences), user_id)
    )

Add later if needed: Caching (when performance matters), validation (when bad data appears), merging (when requirement emerges).

Principle 3: Surgical Changes

Touch only what you must. Don’t “improve” adjacent code, comments, or formatting. Match existing style. Clean up orphans YOUR changes created, but only mention pre-existing dead code.

The test: Every changed line should trace directly to the user’s request.

Example 1: Drive-by Refactoring

Request: “Fix the bug where empty emails crash the validator”

Common Mistake
Correct Approach

def validate_user(user_data):
    # Check email format
    if not user_data.get('email'):
    """Validate user data."""
    email = user_data.get('email', '').strip()

    # Validate email
    if not email:
        raise ValueError("Email required")

    # Basic email validation
    if '@' not in user_data['email']:
    if '@' not in email or '.' not in email.split('@')[1]:
        raise ValueError("Invalid email")

    # Check username
    if not user_data.get('username'):
    # Validate username
    username = user_data.get('username', '').strip()
    if not username:
        raise ValueError("Username required")
    if len(username) < 3:
        raise ValueError("Username too short")
    if not username.isalnum():
        raise ValueError("Username must be alphanumeric")

    return True

Problems: Enhanced email validation beyond the fix, added username validation nobody asked for, changed comments, added docstring.

def validate_user(user_data):
    # Check email format
    if not user_data.get('email'):
    email = user_data.get('email', '')
    if not email or not email.strip():
        raise ValueError("Email required")

    # Basic email validation
    if '@' not in user_data['email']:
    if '@' not in email:
        raise ValueError("Invalid email")

    # Check username
    if not user_data.get('username'):
        raise ValueError("Username required")

    return True

Only changed: The specific lines that fix empty email handling.

Example 2: Style Drift

Request: “Add logging to the upload function”

Common Mistake
Correct Approach

def upload_file(file_path, destination):
def upload_file(file_path: str, destination: str) -> bool:
    """Upload file to destination with logging."""
    logger.info(f"Uploading {file_path} to {destination}")
    try:
        with open(file_path, 'rb') as f:
        with open(file_path, "rb") as f:
            data = f.read()
        response = requests.post(destination, files={'file': data})
        response = requests.post(destination, files={"file": data})
        success = response.status_code == 200
        if success:
            logger.info(f"Upload successful: {file_path}")
        else:
            logger.error(f"Upload failed: status={response.status_code}")
        if response.status_code == 200:
            return True
        else:
            return False
        return success
    except Exception as e:
        print(f"Error: {e}")
        logger.exception(f"Upload error for {file_path}: {e}")
        return False

Problems: Changed quote style ('' to ""), added type hints, added docstring, reformatted whitespace, changed boolean return logic.

import logging

logger = logging.getLogger(__name__)

def upload_file(file_path, destination):
    logger.info(f'Starting upload: {file_path}')
    try:
        with open(file_path, 'rb') as f:
            data = f.read()

        response = requests.post(destination, files={'file': data})

        if response.status_code == 200:
            logger.info(f'Upload successful: {file_path}')
            return True
        else:
            logger.error(f'Upload failed: status={response.status_code}')
            return False
    except Exception as e:
        print(f"Error: {e}")
        logger.exception(f'Upload error: {file_path}')
        return False

Matched: Single quotes, no type hints, existing boolean pattern, spacing style.

Principle 4: Goal-Driven Execution

Transform tasks into verifiable goals. For multi-step tasks, state a brief plan with verification checks.

“Add validation” → “Write tests for invalid inputs, then make them pass”
“Fix the bug” → “Write a test that reproduces it, then make it pass”
“Refactor X” → “Ensure tests pass before and after”

Example 1: Vague vs. Verifiable

Request: “Fix the authentication system”

Common Mistake
Correct Approach

I'll fix the authentication system by:
1. Reviewing the code
2. Identifying issues
3. Making improvements
4. Testing the changes

[Proceeds to make changes without clear success criteria]

To "fix authentication," I need to define success criteria.
What specific issue are we solving?

If the issue is "users stay logged in after password change":

Plan:
1. Write test: Change password → verify old session invalidated
   Verify: Test fails (reproduces bug)

2. Implement: Invalidate sessions on password change
   Verify: Test passes

3. Check edge cases: Multiple active sessions, concurrent changes
   Verify: Additional tests pass

4. Verify no regression: Existing auth tests still pass
   Verify: Full test suite green

Example 2: Multi-Step with Verification

Request: “Add rate limiting to the API”

Common Mistake
Correct Approach

# Implements full rate limiting with Redis, multiple strategies,
# configuration system, and monitoring in one 300-line commit
# No clear verification steps

Plan for rate limiting:

1. Add basic in-memory rate limiting (single endpoint)
   Verify:
   - Test: 100 requests → first 10 succeed, rest get 429
   - Manual: curl endpoint 11 times, see rate limit error

2. Extract to middleware (apply to all endpoints)
   Verify:
   - Test: Rate limits apply to /users and /posts
   - Existing endpoint tests still pass

3. Add Redis backend (for multi-server)
   Verify:
   - Test: Rate limit persists across app restarts
   - Test: Two app instances share rate limit counter

4. Add configuration (rates per endpoint)
   Verify:
   - Test: /search allows 10/min, /users allows 100/min
   - Config file parsed correctly

Each step is independently verifiable and deployable.
Start with step 1?

Example 3: Test-First Verification

Request: “The sorting breaks when there are duplicate scores”

Common Mistake
Correct Approach

# Immediately changes sort logic without confirming the bug
def sort_scores(scores):
    return sorted(scores, key=lambda x: (-x['score'], x['name']))

# 1. First, write a test that reproduces the issue
def test_sort_with_duplicate_scores():
    scores = [
        {'name': 'Alice', 'score': 100},
        {'name': 'Bob', 'score': 100},
        {'name': 'Charlie', 'score': 90},
    ]

    result = sort_scores(scores)

    # The bug: order is non-deterministic for duplicates
    assert result[0]['score'] == 100
    assert result[1]['score'] == 100
    assert result[2]['score'] == 90

# Verify: Run test 10 times → fails with inconsistent ordering

# 2. Now fix with stable sort
def sort_scores(scores):
    """Sort by score descending, then name ascending for ties."""
    return sorted(scores, key=lambda x: (-x['score'], x['name']))

# Verify: Test passes consistently

Anti-Patterns Summary

Principle	Anti-Pattern	Fix
Think Before Coding	Silently assumes file format, fields, scope	List assumptions explicitly, ask for clarification
Simplicity First	Strategy pattern for single discount calculation	One function until complexity is actually needed
Surgical Changes	Reformats quotes, adds type hints while fixing bug	Only change lines that fix the reported issue
Goal-Driven	”I’ll review and improve the code"	"Write test for bug X → make it pass → verify no regressions”

Applying to PROMPT.md

These guidelines can be converted into the 3-tier Boundary system from Week 6: Instruction Tuning:

Tier	Guideline Application
Always Do	State assumptions before implementing, define verification plan for multi-step tasks
Ask First	When multiple interpretations exist, when adjacent code changes seem needed
Never Do	Refactoring outside request scope, speculative features, changing existing style