How big can your CLAUDE.md get before it hurts performance?

There is no error message when your CLAUDE.md gets too big. Claude Code will happily load a 4,000-word file, and nothing will look broken. That’s the trap: the cost of an oversized CLAUDE.md is invisible, paid a little at a time, on every single turn — and by the time you feel it, you’ve trained yourself to blame the model.

So: how big is too big? There’s no hard limit, and anyone who quotes you an exact token threshold is guessing. But the question has a real answer once you separate the two distinct costs of size.

Cost one: you pay for it every turn

CLAUDE.md is not loaded once and forgotten. It is injected into the model’s context near the top of the session and stays there for the entire conversation — first prompt to last. Every turn you take, the whole file is part of what the model reads before it answers.

That means the token cost is multiplied by three things at once: turns per session, sessions per day, and engineers on the team. A 2,000-word root file is roughly 2,600 tokens. On a 40-turn session that’s ~104,000 tokens of pure overhead before anyone has typed a question or the model has read a line of code. Multiply across a team and it’s a line item.

We covered the token side in detail in five habits that quietly burn tokens — bloated examples, stale contradictory rules, platitudes the model already knows. If you haven’t done that pass yet, do it first; it’s the fastest money on the table. This article is about the other cost, the one most people never think about.

Cost two: it competes for attention with your actual work

Modern Claude models ship with very large context windows — 200K tokens as standard, up to 1M in the extended-context beta. The natural conclusion is that a few thousand tokens of CLAUDE.md is a rounding error, so size doesn’t matter. That conclusion is wrong, and the research is now clear about why.

The capacity of the window is not the same as the model’s ability to use all of it evenly. Two well-documented effects matter here:

Lost in the middle. Liu et al. (2024) showed that when the relevant document is placed in the middle of a long input, multi-document QA accuracy drops by 30% or more versus placing it at the very start or end. Attention is U-shaped: tokens at the beginning and end of the context get disproportionate weight, and the middle gets neglected. This isn’t a bug in one model — it falls out of how Rotary Position Embedding (RoPE), used in most current LLMs, encodes position.

Context rot. Chroma’s 2025 study formalized the broader pattern: across 18 frontier models — including Claude Opus 4, GPT-4.1, and Gemini 2.5 Pro — measured performance degrades as input length grows, well before the context window is full. It’s an architectural property of transformer attention, not a capability gap that the next training run fixes. More stuff in the window means the model attends less reliably to any given piece of it.

Here’s how that connects to CLAUDE.md. The file sits in a high-attention zone at the top, so its own instructions are read well — that part is fine. The problem is everything it pushes down. A bloated CLAUDE.md shoves your actual code, the conversation so far, and the file you’re editing further into the window, closer to the neglected middle, and adds to the total length that drives context rot for the whole session. You don’t lose the CLAUDE.md; you lose attention on the work the CLAUDE.md was supposed to be helping with.

The cruel version: a 3,000-word CLAUDE.md written to make Claude more careful about your codebase can make it attend less well to your codebase, because the file crowded the window it had to reason in.

The nested-file multiplier

Claude Code merges CLAUDE.md files down a directory tree — a root file, plus one in packages/api/, plus one in the directory you happen to be working in. They stack. A team that keeps each individual file “reasonable” at 800 words can still be loading 3,000+ words of merged instructions into a session in a deep monorepo, and no one ever sees the combined total because no single file looks large.

If you maintain nested CLAUDE.md files, the number that matters is the sum that gets loaded in your deepest working directory, not the size of any one file.

A practical size budget

There’s no measured cliff, so treat these as editorial targets, not physics:

Root `CLAUDE.md` size	What to expect
Under ~100 lines / ~800 tokens	The healthy zone. Cheap per turn, negligible attention cost, easy to keep contradiction-free. Aim here.
~100–250 lines	Fine if every line earns its place. Most files this size are 40% prunable. Re-read it monthly.
250–500 lines	Almost always bloated. You’re paying real per-turn tokens and starting to crowd the window. Audit it.
500+ lines	Actively working against you. The model is reconciling contradictions and your real work is being pushed toward the middle. Cut hard.

These are for the root file. Add the nested multiplier on top if you have it.

The reason “under ~100 lines” keeps showing up — in this piece and in our token-burning habits one — isn’t a magic number. It’s that almost every codebase’s genuinely non-obvious, contrarian, must-know rules fit in about that much space. Everything beyond it is usually examples that belong in code comments, platitudes the model already knows, or personal preferences that belong in your per-user ~/.claude/CLAUDE.md.

How to actually measure yours

You don’t have to guess. Two quick checks:

Token count, not word count. Run your file through any tokenizer (or just multiply words by ~1.3). That’s the number you pay per turn. If it’s over ~1,500 tokens for the root file alone, you have a target.
The read-aloud test. Read the whole file top to bottom in one sitting. Every time you hit a line you’d be embarrassed to say to a competent new hire (“write clean code”), or a rule that contradicts an earlier one, mark it. Delete the marks. Most files lose a third of their length on the first pass with zero loss of guidance — usually better guidance, because the signal-to-noise improved.

The test for keeping a line is the same one from the writing guide: would a fresh, well-trained model do this anyway? If yes, the line is dead weight no matter how reasonable it sounds. The lines that earn their place are the ones the model could not have guessed — your specific framework, your one untouchable function, the deploy quirk that bites every newcomer.

The short version

Large context windows did not make CLAUDE.md size free. They moved the cost from “it won’t fit” to “it quietly degrades everything else in the window.” A tight file is read well and leaves room for the model to reason about your actual code. A bloated one pays twice — in tokens every turn, and in attention every turn — for guidance you could have written in a quarter of the space.

If you want a second pair of eyes on yours, that’s the CLAUDE.md audit we run on this site: 90 minutes, a written diagnosis of what’s costing you, one rewritten root file, and five reusable templates. $299 solo, $799 for a team of 2–10. Size and the habits behind it are most of what we find.

Companion reading

Five habits in your CLAUDE.md that are quietly burning tokens — the per-turn token side, with before/after rewrites.
How to write a great CLAUDE.md — what belongs in the file in the first place.
The CLAUDE.md problem in teams of 5+ — how size and drift compound once more than one person edits the file.

Sources: Liu et al., “Lost in the Middle: How Language Models Use Long Contexts” (2024); Chroma, “Context Rot” (2025); Anthropic Claude Code documentation on memory and CLAUDE.md loading.