Skip to content

Karpathy Guidelines

Behavioral guidelines derived from Andrej Karpathy’s observations on LLM coding pitfalls. These classify the mistakes LLM coding agents repeatedly make into four principles, with correction methods and code examples for each.

1. Think Before Coding

Don’t assume. Don’t hide confusion. Surface tradeoffs.

2. Simplicity First

Minimum code that solves the problem. Nothing speculative.

3. Surgical Changes

Touch only what you must. No “improving” adjacent code.

4. Goal-Driven Execution

Define success criteria. Loop until verified.


Before implementing, state your assumptions explicitly. If uncertain, ask. If multiple interpretations exist, present them — don’t pick silently. If a simpler approach exists, say so.

Request: “Add a feature to export user data”

def export_users(format='json'):
"""Export all users to a file."""
users = User.query.all()
if format == 'json':
with open('users.json', 'w') as f:
json.dump([u.to_dict() for u in users], f)
elif format == 'csv':
with open('users.csv', 'w') as f:
writer = csv.DictWriter(f, fieldnames=['id', 'email', 'name'])
writer.writeheader()
writer.writerows([u.to_dict() for u in users])
return f"Exported {len(users)} users"

Problems:

  • Assumed exporting ALL users (what about pagination? privacy?)
  • Assumed file location without asking
  • Assumed which fields to include
  • Hardcoded CSV fieldnames without checking actual data structure

Request: “Make the search faster”

# Adds caching, database indexes, and async processing
# without asking which "faster" matters
@lru_cache(maxsize=1000)
async def search(query: str) -> List[Result]:
# ... 200 lines of optimization code

Minimum code that solves the problem. No features beyond what was asked. No abstractions for single-use code. No error handling for impossible scenarios. If you write 200 lines and it could be 50, rewrite it.

Request: “Add a function to calculate discount”

from abc import ABC, abstractmethod
from dataclasses import dataclass
class DiscountStrategy(ABC):
@abstractmethod
def calculate(self, amount: float) -> float:
pass
class PercentageDiscount(DiscountStrategy):
def __init__(self, percentage: float):
self.percentage = percentage
def calculate(self, amount: float) -> float:
return amount * (self.percentage / 100)
class FixedDiscount(DiscountStrategy):
def __init__(self, fixed_amount: float):
self.fixed_amount = fixed_amount
def calculate(self, amount: float) -> float:
return min(self.fixed_amount, amount)
@dataclass
class DiscountConfig:
strategy: DiscountStrategy
min_purchase: float = 0.0
max_discount: float = float('inf')
class DiscountCalculator:
def __init__(self, config: DiscountConfig):
self.config = config
def apply_discount(self, amount: float) -> float:
if amount < self.config.min_purchase:
return 0
discount = self.config.strategy.calculate(amount)
return min(discount, self.config.max_discount)
# Usage requires 30+ lines of setup for a simple calculation

Problem: Strategy pattern, Config dataclass, ABC — all for a single discount calculation.

Request: “Save user preferences to database”

class PreferenceManager:
def __init__(self, db, cache=None, validator=None):
self.db = db
self.cache = cache or InMemoryCache()
self.validator = validator or DefaultValidator()
def save(self, user_id: int, prefs: dict,
merge: bool = True,
validate: bool = True,
notify: bool = False) -> bool:
if validate:
errors = self.validator.validate(prefs)
if errors:
raise ValidationError(errors)
existing = self.get(user_id) if merge else {}
if merge:
existing.update(prefs)
prefs = existing
self.db.execute(
"UPDATE users SET preferences = ? WHERE id = ?",
(json.dumps(prefs), user_id)
)
if self.cache:
self.cache.set(f"prefs:{user_id}", prefs)
if notify:
self.notify_preference_change(user_id, prefs)
return True

Problem: Caching, validation, merging, notifications — all unrequested features.


Touch only what you must. Don’t “improve” adjacent code, comments, or formatting. Match existing style. Clean up orphans YOUR changes created, but only mention pre-existing dead code.

The test: Every changed line should trace directly to the user’s request.

Request: “Fix the bug where empty emails crash the validator”

def validate_user(user_data):
# Check email format
if not user_data.get('email'):
"""Validate user data."""
email = user_data.get('email', '').strip()
# Validate email
if not email:
raise ValueError("Email required")
# Basic email validation
if '@' not in user_data['email']:
if '@' not in email or '.' not in email.split('@')[1]:
raise ValueError("Invalid email")
# Check username
if not user_data.get('username'):
# Validate username
username = user_data.get('username', '').strip()
if not username:
raise ValueError("Username required")
if len(username) < 3:
raise ValueError("Username too short")
if not username.isalnum():
raise ValueError("Username must be alphanumeric")
return True

Problems: Enhanced email validation beyond the fix, added username validation nobody asked for, changed comments, added docstring.

Request: “Add logging to the upload function”

def upload_file(file_path, destination):
def upload_file(file_path: str, destination: str) -> bool:
"""Upload file to destination with logging."""
logger.info(f"Uploading {file_path} to {destination}")
try:
with open(file_path, 'rb') as f:
with open(file_path, "rb") as f:
data = f.read()
response = requests.post(destination, files={'file': data})
response = requests.post(destination, files={"file": data})
success = response.status_code == 200
if success:
logger.info(f"Upload successful: {file_path}")
else:
logger.error(f"Upload failed: status={response.status_code}")
if response.status_code == 200:
return True
else:
return False
return success
except Exception as e:
print(f"Error: {e}")
logger.exception(f"Upload error for {file_path}: {e}")
return False

Problems: Changed quote style ('' to ""), added type hints, added docstring, reformatted whitespace, changed boolean return logic.


Transform tasks into verifiable goals. For multi-step tasks, state a brief plan with verification checks.

  • “Add validation” → “Write tests for invalid inputs, then make them pass”
  • “Fix the bug” → “Write a test that reproduces it, then make it pass”
  • “Refactor X” → “Ensure tests pass before and after”

Request: “Fix the authentication system”

I'll fix the authentication system by:
1. Reviewing the code
2. Identifying issues
3. Making improvements
4. Testing the changes
[Proceeds to make changes without clear success criteria]

Request: “Add rate limiting to the API”

# Implements full rate limiting with Redis, multiple strategies,
# configuration system, and monitoring in one 300-line commit
# No clear verification steps

Request: “The sorting breaks when there are duplicate scores”

# Immediately changes sort logic without confirming the bug
def sort_scores(scores):
return sorted(scores, key=lambda x: (-x['score'], x['name']))

PrincipleAnti-PatternFix
Think Before CodingSilently assumes file format, fields, scopeList assumptions explicitly, ask for clarification
Simplicity FirstStrategy pattern for single discount calculationOne function until complexity is actually needed
Surgical ChangesReformats quotes, adds type hints while fixing bugOnly change lines that fix the reported issue
Goal-Driven”I’ll review and improve the code""Write test for bug X → make it pass → verify no regressions”

These guidelines can be converted into the 3-tier Boundary system from Week 6: Instruction Tuning:

TierGuideline Application
Always DoState assumptions before implementing, define verification plan for multi-step tasks
Ask FirstWhen multiple interpretations exist, when adjacent code changes seem needed
Never DoRefactoring outside request scope, speculative features, changing existing style