← All reviews
Verdict · June 9, 2026 · 7 min read

Claude Fable 5 vs Opus 4.8: Real Coding Gains, Mixed-Up Benchmarks, and the 2× Price Math

Fable 5 is the first Claude tier above Opus. The coding gains are real — independently reproduced on launch day. But some of the launch-chart numbers belong to Mythos 5, a sibling model you can't buy, and the new restrictions affect agent users most. Here's every number assigned to the right model.

Anthropic shipped Claude Fable 5 on June 9 — not a new Opus, but a new tier above it. The announcement calls it “a Mythos-class model that we’ve made safe for general use,” and the launch chart shows it at #1 on essentially every coding and agent benchmark.

Most coverage reprinted that chart. We did what we did for the agent-teams cost numbers: checked which model each number actually refers to. Some of the most impressive scores belong to Claude Mythos 5 — the same underlying model without Fable’s safety checks, sold only to vetted customers. Everything below comes from the primary documents, all collected in the Sources list at the end.

The verdict in 30 seconds

  • The coding gains are real. SWE-bench Verified 95.0% vs Opus 4.8’s 88.6% — and Vals AI, an independent benchmark lab, measured the same 95.0% (#1) on launch day with its own test setup. Same-day third-party confirmation of a vendor’s headline claim is rare.
  • Not every number is Fable’s. On Terminal-Bench 2.1, the chart’s 88.0% belongs to Mythos 5. The Fable 5 you can buy scored 84.3% — and in 20.9% of test runs it hit a safety refusal and fell back to Opus 4.8 mid-task. Anthropic does disclose this, on page 255 of the system card (the model’s technical report).
  • The price is 2× Opus 4.8: $10/$50 per million tokens vs $5/$25 — exactly what Opus 4.8’s fast mode costs. Same money, two different upgrades: speed or capability. The cheapest way in is the Batch API at $5/$25 — Fable-level capability at Opus 4.8 prices, if you can wait for async results.
  • The main caveats for agent users: your data is kept for 30 days (no zero-retention option), the rate limits are separate and lower than Opus’s, refusals come back as normal HTTP 200 responses, thinking can’t be turned off, and structured outputs are missing from the supported list.

If your work is routine, Opus 4.8 at half the price remains the right choice. If one failed attempt costs you an afternoon, Fable 5 is the first model we’d call worth paying double for — for that kind of task, based on evidence rather than impressions.

What Fable 5 actually is

Anthropic’s model names were Haiku, Sonnet, Opus — all poem forms. Mythos-class is a new tier above Opus, released as two versions: claude-mythos-5 (full capability, restricted to Project Glasswing customers) and claude-fable-5 (the same model plus safety classifiers, available to everyone). “Fable” comes from the Latin fabula, a relative of the Greek mythos; per Anthropic, the safety checks are the entire difference between the two versions.

Those checks are concrete: classifiers watch for offensive-security requests, dangerous biology/chemistry requests, and attempts to extract the model’s capabilities. When a classifier triggers, apps like Claude Code automatically retry on Opus 4.8; direct API calls are instead blocked with stop_reason: "refusal". Anthropic says this happens in fewer than 5% of sessions on average — but security-related work triggers it far more often than the average suggests, as we’ll see.

Specs: 1M-token context window by default (long context costs no extra), 128K max output, same tokenizer as Opus 4.8, model ID claude-fable-5.

The benchmark numbers, matched to the model you can buy

This is the table we’d publish instead of the launch chart — the same official numbers, but with the purchasable Fable 5 in its own column:

BenchmarkFable 5Opus 4.8GPT-5.5Gemini 3.1 Pro
SWE-bench Verified95.088.680.6
SWE-bench Pro80.069.258.654.2
Terminal-Bench 2.184.3*82.783.470.7
FrontierCode (Diamond)29.313.45.7
OSWorld-Verified85.083.478.776.2
GDPval-AA (Elo)1932189017691314
Numbers reported by Anthropic (system card, June 9, 2026). *On Terminal-Bench, the launch chart’s 88.0% is Mythos 5; Fable 5 scored 84.3%, with safety refusals in 20.9% of test runs that forced a fallback to Opus 4.8 (p.255). Anthropic’s own note: Fable’s scores “reflect its production safeguards.”

Three reasons to trust these numbers more than a typical launch chart:

  1. Independent confirmation came the same day. Vals AI ran SWE-bench Verified with its own test setup: Fable 95.0%, #1, ahead of Opus 4.8 (88.6%) and GPT-5.5 (82.6%). Artificial Analysis ranked it #1 on its Intelligence Index — and measured Fable falling back to Opus on 2% of its test tasks, so the safety checks visibly cost points even there.
  2. Anthropic also reported results where Fable loses. On Vending-Bench 2, Fable’s best run finished below Opus 4.8 ($5,680 vs $5,787), and its MCP Atlas gain is barely measurable (83.3 vs 82.2). A table that includes weaker results is more credible than one that only shows wins.
  3. The gains are biggest on the hardest tests. Benchmarks that were already near their ceiling barely move; the hardest ones jump — FrontierCode Diamond more than doubles (29.3 vs 13.4), and CursorBench reaches 72.9% vs GPT-5.5’s best published 64.3%. That pattern suggests a genuinely more capable model, not one tuned to ace popular leaderboards.

What’s missing matters too: Fable has no entry yet on LMArena, ARC-AGI, or the aider leaderboard. Per our benchmark policy, we’ll update when independent results are published. Hands-on reports so far are impressions, not measurements — Simon Willison: “it’s a beast”; Karpathy: “a major-version-bump-deserving step change forward.”

The price math

ModelInput /MTokOutput /MTokWhat it means
Claude Fable 5$10$50the new top model
Claude Fable 5 (Batch)$5$25Fable quality at the Opus 4.8 price, results within ~24h
Claude Opus 4.8$5$25the default
Claude Opus 4.8 (fast mode)$10$50same price as Fable — buys speed instead
OpenAI GPT-5.5$5$30half the input price
Gemini 3.1 Pro$2$12one-fifth the input price
Official vendor pricing pages, June 9, 2026. Artificial Analysis briefly listed Fable’s input price as $12.50/MTok — that conflicts with Anthropic’s official $10, so we use the primary source.

What matters more than the sticker price:

  • Batch is the bargain. Through the Batch API, Fable costs exactly what Opus 4.8 costs interactively. Overnight refactors, batch evaluations, large-scale code review — any work that can wait gets the top model at no extra cost.
  • The effort setting changes cost more than the price sheet does. Thinking is always on, and the effort level moves spend sharply: Simon Willison measured the same generation at $0.10 on low effort vs $0.72 on max — a 7.5× difference from one setting. Anthropic’s advice: default to high; even lower effort on Fable often beats the highest setting on older models.
  • On a subscription? Note two dates. Fable is included on Pro/Max/Team plans only through June 22, and it counts at 2× the usage weight; from June 23 it requires pay-as-you-go usage credits. If your usage limits are already tight, Fable empties them twice as fast.

In Claude Code

  • Select it with /model fable (or best). Requires v2.1.170+; it is not the default model.
  • It’s built for tasks too big to finish in one session: describe the outcome you want rather than step-by-step instructions, hand it ambiguous problems, and skip the “double-check your work” reminders — at high effort it verifies its own work.
  • Thinking cannot be turned off. The session toggle, alwaysThinkingEnabled, and MAX_THINKING_TOKENS=0 all have no effect on Fable.
  • Security work quietly falls back to Opus. Penetration testing, CTF exercises, and biology-related codebases trigger the safety classifiers “often on the first request,” per the official docs. You’d pay Fable prices and get Opus answers — for that work, just stay on Opus 4.8.

The restrictions agent builders should check first

The restrictionWhat it means for you
30-day data retentionFable is a “Covered Model”: all traffic is kept for 30 days, and zero-data-retention agreements don’t apply. If your contract requires ZDR, Fable is off the table — full stop. This follows you to Amazon Bedrock too: AWS’s own launch post says retention is required for Mythos-class traffic there, and that once you opt in, “your data will leave AWS’s data and security boundary.”
Separate, lower rate limitsFable doesn’t share the Opus limit pool: at Tier 4 it allows 4M input tokens/min vs the Opus pool’s 10M. Large multi-agent setups hit this first; throttling errors were reported on day one.
Refusals are responses, not errorsA blocked request returns HTTP 200 with stop_reason: "refusal" and a category (cyber, bio, reasoning_extraction). Agent code that only handles end_turn/tool_use will stop without an obvious error — add a refusal branch before switching (full detection guide). An opt-in fallbacks parameter exists in beta, but not on Batch, Bedrock, Vertex, or Foundry.
Structured outputs unlistedThe supported-models list names Opus 4.8, 4.7, 4.6, Sonnet 4.6, and Haiku 4.5 — not Fable 5. If your pipeline depends on output_config.format, verify before switching.
Lower caching thresholdThe minimum cacheable prompt drops to 512 tokens (vs 1,024 on Opus 4.8). Short agent system prompts that silently failed to cache on Opus will cache here — a small saving.
Fair refusal billingRequests refused before any output aren’t billed; if a classifier triggers mid-stream, you pay only for what was already generated.

When to pay double — and when not

Pay it when the math from our agent-teams verdict applies: the time saved or risk avoided is worth more than the extra cost. That means migrations and refactors too big for one session, bugs that have already defeated Opus 4.8, design decisions where a wrong architecture is expensive, and overnight autonomous runs. The FrontierCode and CursorBench results say this is exactly where Fable’s lead is widest. And anything batchable — there the premium is zero.

Don’t pay it for routine work (Opus 4.8 is the substitute Anthropic’s own fallback design considers acceptable), for speed-sensitive interactive use (Fable generates ~60 tokens/second, below average for top models — the same money buys fast mode if speed is what you need), for security or biology work (frequent fallbacks mean Fable prices for Opus answers), or under zero-data-retention requirements.

Disclosure: bestagent.dev’s drafting pipeline runs in Claude Code, and this article was produced with Fable 5 selected. Judge the article on its sources, not on which model helped write it.

What would change our mind

Independent entries on LMArena, ARC-AGI, and aider when they arrive. Real token-per-task measurements: the system card claims Fable beat Opus on GDPval while using fewer turns and tokens — if that holds in practice, paying double looks better; if always-on thinking inflates token use, worse. Real-world fallback rates on ordinary codebases. And what the pay-as-you-go pricing actually looks like after June 23. One more thing worth checking from the launch discussion: FrontierCode was published just one day before Fable aced it, so it deserves a check for benchmark contamination (whether test material leaked into training data) once independent researchers get access.

Companion reading

Sources

  1. Anthropic — Claude Fable 5 and Claude Mythos 5 (announcement)
  2. Anthropic — Fable 5 & Mythos 5 system card (PDF; benchmarks pp.252–262)
  3. Anthropic docs — models overview
  4. Anthropic docs — introducing Fable 5 and Mythos 5
  5. Anthropic docs — pricing
  6. Anthropic docs — migration guide
  7. Anthropic docs — rate limits
  8. Anthropic docs — structured outputs (supported models)
  9. Claude Code docs — model configuration
  10. Vals AI — SWE-bench Verified leaderboard (independent)
  11. Artificial Analysis — Claude Fable 5 (independent)
  12. Hacker News — launch thread
  13. Simon Willison — day-one cost notes
  14. OpenAI — API pricing
  15. Google — Gemini API pricing
  16. AWS — Claude Fable 5 on AWS launch post (Bedrock retention requirement)

Related reading


Reviews independently produced · Editorial policy

Read more reviews →