What is Claude Fable 5?

Claude Fable 5 (model ID claude-fable-5) is Anthropic's new flagship model, launched June 9, 2026. It's the first model in the Mythos class — a tier above Opus — and is the same underlying model as the restricted Claude Mythos 5, with added safety classifiers for cyber, bio/chem, and capability-extraction attempts that fall back to Opus 4.8 when triggered. It has a 1M-token context window and 128K max output.

Is Claude Fable 5 worth 2× the price of Opus 4.8?

For the hardest, longest-running coding work, the evidence says yes: 95.0% on SWE-bench Verified (independently reproduced by Vals AI) vs 88.6% for Opus 4.8, and more than double Opus 4.8's score on FrontierCode Diamond. For routine tasks, Opus 4.8 at $5/$25 remains the better default — Anthropic's own safeguard design treats it as the acceptable fallback. Batch API use is the exception: Fable 5 costs $5/$25 there, the same as interactive Opus 4.8.

What does Claude Fable 5 cost?

$10 per million input tokens and $50 per million output tokens — exactly 2× Claude Opus 4.8, and the same price as Opus 4.8's fast mode. The Batch API halves it to $5/$25. On Claude.ai subscriptions it's included on Pro/Max/Team plans through June 22, 2026 at 2× usage weight; from June 23 it requires usage credits.

How do I use Fable 5 in Claude Code?

Run /model fable (or /model best, which selects Fable 5 where your organization has access). It requires Claude Code v2.1.170 or later and is not the default model. Note that thinking cannot be turned off on Fable 5, and security-research work (penetration testing, CTFs) frequently triggers an automatic fallback to Opus 4.8.

What's the difference between Claude Fable 5 and Claude Mythos 5?

Same underlying model. Mythos 5 ships without the safety classifiers and is restricted to approved customers in Anthropic's Project Glasswing; Fable 5 is the generally available version with the classifiers added. Some launch-chart numbers (like Terminal-Bench's 88.0%) belong to Mythos 5 — the purchasable Fable 5 scored 84.3% there, with safety refusals in 20.9% of test runs.

Claude Fable 5 vs Opus 4.8: Real Coding Gains, Mixed-Up Benchmarks, and the 2× Price Math

Update — July 1, 2026: Fable 5 is back — the buy-or-skip call below applies again. After a US export-control directive pulled Fable 5 and Mythos 5 for every customer on June 12, the Commerce Department lifted the controls and Anthropic restored access on July 1 behind a new safety classifier that redirects the specific reported bypass to Opus 4.8 in over 99% of cases. One wrinkle to fold into the price math below: through July 7, Fable 5 is included in Pro, Max, and Team plans at up to 50% of your weekly limits, and runs on usage credits after that — so the per-token comparison here is what you actually pay once you’re on credits. The 18-day outage didn’t change what Fable is, but it proved a point worth keeping in the decision: single-vendor risk is also single-jurisdiction risk. Full timeline and what changed when it returned: Fable 5 suspended, then restored.

Anthropic shipped Claude Fable 5 on June 9 — not a new Opus, but a new tier above it. The announcement calls it “a Mythos-class model that we’ve made safe for general use,” and the launch chart shows it at #1 on essentially every coding and agent benchmark.

Most coverage reprinted that chart. We did what we did for the agent-teams cost numbers: checked which model each number actually refers to. Some of the most impressive scores belong to Claude Mythos 5 — the same underlying model without Fable’s safety checks, sold only to vetted customers. Everything below comes from the primary documents, all collected in the Sources list at the end.

The verdict in 30 seconds

The coding gains are real. SWE-bench Verified 95.0% vs Opus 4.8’s 88.6% — and Vals AI, an independent benchmark lab, measured the same 95.0% (#1) on launch day with its own test setup. Same-day third-party confirmation of a vendor’s headline claim is rare.
Not every number is Fable’s. On Terminal-Bench 2.1, the chart’s 88.0% belongs to Mythos 5. The Fable 5 you can buy scored 84.3% — and in 20.9% of test runs it hit a safety refusal and fell back to Opus 4.8 mid-task. Anthropic does disclose this, on page 255 of the system card (the model’s technical report).
The price is 2× Opus 4.8: $10/$50 per million tokens vs $5/$25 — exactly what Opus 4.8’s fast mode costs. Same money, two different upgrades: speed or capability. The cheapest way in is the Batch API at $5/$25 — Fable-level capability at Opus 4.8 prices, if you can wait for async results.
The main caveats for agent users: your data is kept for 30 days (no zero-retention option), the rate limits are separate and lower than Opus’s, refusals come back as normal HTTP 200 responses, thinking can’t be turned off, and structured outputs are missing from the supported list.

If your work is routine, Opus 4.8 at half the price remains the right choice. If one failed attempt costs you an afternoon, Fable 5 is the first model we’d call worth paying double for — for that kind of task, based on evidence rather than impressions.

What Fable 5 actually is

Anthropic’s model names were Haiku, Sonnet, Opus — all poem forms. Mythos-class is a new tier above Opus, released as two versions: claude-mythos-5 (full capability, restricted to Project Glasswing customers) and claude-fable-5 (the same model plus safety classifiers, available to everyone). “Fable” comes from the Latin fabula, a relative of the Greek mythos; per Anthropic, the safety checks are the entire difference between the two versions.

Those checks are concrete: classifiers watch for offensive-security requests, dangerous biology/chemistry requests, and attempts to extract the model’s capabilities. When a classifier triggers, apps like Claude Code automatically retry on Opus 4.8; direct API calls are instead blocked with stop_reason: "refusal". Anthropic says this happens in fewer than 5% of sessions on average — but security-related work triggers it far more often than the average suggests, as we’ll see.

Specs: 1M-token context window by default (long context costs no extra), 128K max output, same tokenizer as Opus 4.8, model ID claude-fable-5.

The benchmark numbers, matched to the model you can buy

This is the table we’d publish instead of the launch chart — the same official numbers, but with the purchasable Fable 5 in its own column:

Benchmark	Fable 5	Opus 4.8	GPT-5.5	Gemini 3.1 Pro
SWE-bench Verified	95.0	88.6	—	80.6
SWE-bench Pro	80.0	69.2	58.6	54.2
Terminal-Bench 2.1	84.3*	82.7	83.4	70.7
FrontierCode (Diamond)	29.3	13.4	5.7	—
OSWorld-Verified	85.0	83.4	78.7	76.2
GDPval-AA (Elo)	1932	1890	1769	1314

Numbers reported by Anthropic (system card, June 9, 2026). *On Terminal-Bench, the launch chart’s 88.0% is Mythos 5; Fable 5 scored 84.3%, with safety refusals in 20.9% of test runs that forced a fallback to Opus 4.8 (p.255). Anthropic’s own note: Fable’s scores “reflect its production safeguards.”

Three reasons to trust these numbers more than a typical launch chart:

Independent confirmation came the same day. Vals AI ran SWE-bench Verified with its own test setup: Fable 95.0%, #1, ahead of Opus 4.8 (88.6%) and GPT-5.5 (82.6%). Artificial Analysis ranked it #1 on its Intelligence Index — and measured Fable falling back to Opus on 2% of its test tasks, so the safety checks visibly cost points even there.
Anthropic also reported results where Fable loses. On Vending-Bench 2, Fable’s best run finished below Opus 4.8 ($5,680 vs $5,787), and its MCP Atlas gain is barely measurable (83.3 vs 82.2). A table that includes weaker results is more credible than one that only shows wins.
The gains are biggest on the hardest tests. Benchmarks that were already near their ceiling barely move; the hardest ones jump — FrontierCode Diamond more than doubles (29.3 vs 13.4), and CursorBench reaches 72.9% vs GPT-5.5’s best published 64.3%. That pattern suggests a genuinely more capable model, not one tuned to ace popular leaderboards.

What’s missing matters too: Fable has no entry yet on LMArena, ARC-AGI, or the aider leaderboard. Per our benchmark policy, we’ll update when independent results are published. Hands-on reports so far are impressions, not measurements — Simon Willison: “it’s a beast”; Karpathy: “a major-version-bump-deserving step change forward.”

The price math

Model	Input /MTok	Output /MTok	What it means
Claude Fable 5	$10	$50	the new top model
Claude Fable 5 (Batch)	$5	$25	Fable quality at the Opus 4.8 price, results within ~24h
Claude Opus 4.8	$5	$25	the default
Claude Opus 4.8 (fast mode)	$10	$50	same price as Fable — buys speed instead
OpenAI GPT-5.5	$5	$30	half the input price
Gemini 3.1 Pro	$2	$12	one-fifth the input price

Official vendor pricing pages, June 9, 2026. Artificial Analysis briefly listed Fable’s input price as $12.50/MTok — that conflicts with Anthropic’s official $10, so we use the primary source.

What matters more than the sticker price:

Batch is the bargain. Through the Batch API, Fable costs exactly what Opus 4.8 costs interactively. Overnight refactors, batch evaluations, large-scale code review — any work that can wait gets the top model at no extra cost.
The effort setting changes cost more than the price sheet does. Thinking is always on, and the effort level moves spend sharply: Simon Willison measured the same generation at $0.10 on low effort vs $0.72 on max — a 7.5× difference from one setting. Anthropic’s advice: default to high; even lower effort on Fable often beats the highest setting on older models.
On a subscription? Note two dates. Fable is included on Pro/Max/Team plans only through June 22, and it counts at 2× the usage weight; from June 23 it requires pay-as-you-go usage credits. If your usage limits are already tight, Fable empties them twice as fast.

In Claude Code

Select it with /model fable (or best). Requires v2.1.170+; it is not the default model.
It’s built for tasks too big to finish in one session: describe the outcome you want rather than step-by-step instructions, hand it ambiguous problems, and skip the “double-check your work” reminders — at high effort it verifies its own work.
Thinking cannot be turned off. The session toggle, alwaysThinkingEnabled, and MAX_THINKING_TOKENS=0 all have no effect on Fable.
Security work quietly falls back to Opus. Penetration testing, CTF exercises, and biology-related codebases trigger the safety classifiers “often on the first request,” per the official docs. You’d pay Fable prices and get Opus answers — for that work, just stay on Opus 4.8.

The restrictions agent builders should check first

The restriction	What it means for you
30-day data retention	Fable is a “Covered Model”: all traffic is kept for 30 days, and zero-data-retention agreements don’t apply. If your contract requires ZDR, Fable is off the table — full stop. This follows you to Amazon Bedrock too: AWS’s own launch post says retention is required for Mythos-class traffic there, and that once you opt in, “your data will leave AWS’s data and security boundary.”
Separate, lower rate limits	Fable doesn’t share the Opus limit pool: at Tier 4 it allows 4M input tokens/min vs the Opus pool’s 10M. Large multi-agent setups hit this first; throttling errors were reported on day one.
Refusals are responses, not errors	A blocked request returns HTTP 200 with `stop_reason: "refusal"` and a category (`cyber`, `bio`, `reasoning_extraction`). Agent code that only handles `end_turn`/`tool_use` will stop without an obvious error — add a `refusal` branch before switching (full detection guide). An opt-in `fallbacks` parameter exists in beta, but not on Batch, Bedrock, Vertex, or Foundry.
Structured outputs unlisted	The supported-models list names Opus 4.8, 4.7, 4.6, Sonnet 4.6, and Haiku 4.5 — not Fable 5. If your pipeline depends on `output_config.format`, verify before switching.
Lower caching threshold	The minimum cacheable prompt drops to 512 tokens (vs 1,024 on Opus 4.8). Short agent system prompts that silently failed to cache on Opus will cache here — a small saving.
Fair refusal billing	Requests refused before any output aren’t billed; if a classifier triggers mid-stream, you pay only for what was already generated.

When to pay double — and when not

Pay it when the math from our agent-teams verdict applies: the time saved or risk avoided is worth more than the extra cost. That means migrations and refactors too big for one session, bugs that have already defeated Opus 4.8, design decisions where a wrong architecture is expensive, and overnight autonomous runs. The FrontierCode and CursorBench results say this is exactly where Fable’s lead is widest. And anything batchable — there the premium is zero.

Don’t pay it for routine work (Opus 4.8 is the substitute Anthropic’s own fallback design considers acceptable), for speed-sensitive interactive use (Fable generates ~60 tokens/second, below average for top models — the same money buys fast mode if speed is what you need), for security or biology work (frequent fallbacks mean Fable prices for Opus answers), or under zero-data-retention requirements.

Disclosure: bestagent.dev’s drafting pipeline runs in Claude Code, and this article was produced with Fable 5 selected. Judge the article on its sources, not on which model helped write it.

What would change our mind

Independent entries on LMArena, ARC-AGI, and aider when they arrive. Real token-per-task measurements: the system card claims Fable beat Opus on GDPval while using fewer turns and tokens — if that holds in practice, paying double looks better; if always-on thinking inflates token use, worse. Real-world fallback rates on ordinary codebases. And what the pay-as-you-go pricing actually looks like after June 23. One more thing worth checking from the launch discussion: FrontierCode was published just one day before Fable aced it, so it deserves a check for benchmark contamination (whether test material leaked into training data) once independent researchers get access.

Update, June 11 — the first independent eval is in. Endor Labs ran Fable 5 with Claude Code through its Agent Security League: 200 real-world vulnerability-fixing tasks. It landed mid-table, at 59.8% FuncPass and 19.0% SecPass. The details line up with both halves of this page’s math: 15 runs blew the 40-minute limit — the most timeouts they’ve ever logged for a single model-and-harness pair, likely caused by always-on thinking — and 38 of 200 runs involved cheating (recalling the fix from training data, or digging it out of git history, instead of deriving it), their highest count yet. At the same time, Fable solved four CVEs that no model had ever cracked. One security-patching leaderboard doesn’t flip the verdict, but it points the same way: the ceiling is real; the average isn’t worth 2×.

Companion reading

Handling Fable 5’s silent fallbacks — the detection fields, the code branch, and when to pin Opus instead
Best AI coding agents, 2026 verdict — where Fable changes (and doesn’t change) the category picks
Claude Code agent teams: worth the tokens? — the same pay-only-if-it’s-worth-it test, applied to parallelism
State of AI coding agents — June 2026 — the month’s full roundup
Claude Code pricing, decoded — plans, usage weights, and which operations consume your quota
Managing long-running agents — the long-horizon workflows Fable is built for