Claude Fable 5 Fallbacks: How to Detect the Silent Switch to Opus 4.8 and Handle Refusals in Agent Code
A 900-point Hacker News thread put it bluntly: if Fable 5 stops helping you, you'll never know. True by default — but detectable. Here's exactly what happens on each surface, the response fields to check, the code branch agent loops need, and the cases where the right answer is to not pay for Fable at all.
Claude Fable 5 ships with safety classifiers watching for offensive-security, dangerous bio/chem, and capability-extraction requests. When one triggers, you don’t get an error. In client apps like Claude Code, your request is quietly re-run on Opus 4.8. On the raw API, you get a successful HTTP 200 response that contains a refusal. Anthropic says this touches fewer than 5% of sessions on average — but its own benchmark run hit it in 20.9% of trials, and the discussion that followed launch made the real problem clear: by default, nothing tells you it happened.
This page is the practical companion to our Fable 5 verdict: every behavior documented, every detection hook, and the decision rules. Sources at the end.
What actually happens, per surface
| Surface | On classifier trigger | Can you tell? |
|---|---|---|
| Claude Code (and client apps) | Request automatically re-runs on Opus 4.8; the session continues | No built-in indicator is documented as of June 10 |
| Raw Messages API | Request is blocked: HTTP 200 with stop_reason: "refusal" | Yes — check the response fields below |
Raw API with fallbacks parameter (beta) | Automatically retried on the model you list | Yes — the response says which model answered |
The asymmetry is the trap. API users can detect every refusal; app users get the silent swap. If you’re paying 2× for Fable on work where the difference matters, the app default means you can’t audit what you actually got.
Detecting refusals in your own agent code
A blocked request is a normal response, not an exception. Three fields matter:
stop_reasoncomes back as"refusal"instead ofend_turnortool_use.stop_details.categorynames which classifier fired:"cyber","bio", or"reasoning_extraction".- The response is HTTP 200 — retry logic keyed on status codes will never see it.
The branch every Fable-bound agent loop needs:
response = client.messages.create(model="claude-fable-5", ...)
if response.stop_reason == "refusal":
category = response.stop_details.category if response.stop_details else None
log_fallback(task_id, category) # count these — see below
response = client.messages.create(model="claude-opus-4-8", ...) # your own fallback
Loops that only branch on end_turn and tool_use don’t crash on a refusal — they stop making progress with no visible error, usually mid-pipeline. Add the branch before switching models, not after the first silent stall.
Count your fallback rate. The published numbers span 5% (Anthropic’s session average) to 20.9% (its own Terminal-Bench trials) to 2% (Artificial Analysis’s independent task measurements). Your codebase has its own number, and it decides whether Fable is worth paying for on that work — one log line per refusal tells you within a week.
The opt-in fallbacks parameter
The API can do the retry for you: an opt-in fallbacks parameter (beta) lists the model to switch to when a classifier fires, and the response reports which model actually answered — automatic and auditable, unlike the client-app behavior.
Where it doesn’t work, per the migration guide: the Message Batches API, Amazon Bedrock, Vertex AI, and Microsoft Foundry. On those surfaces the manual branch above is the only option. (It works on the Claude API and Claude Platform on AWS.)
Billing is fair in both directions: a request refused before any output isn’t billed; if a classifier fires mid-stream you pay only for what was generated, and a “fallback credit” refunds the prompt-cache cost of the model switch.
In Claude Code
- The official docs are direct about who hits this: penetration testing, CTF exercises, and biology-adjacent codebases trigger fallback “often on the first request.” Security work on Fable is effectively Opus work at Fable prices.
- The fallback target is configurable:
ANTHROPIC_DEFAULT_FABLE_MODELchanges which model Claude Code re-runs on (default: Opus 4.8 on the Anthropic API).DISABLE_PROMPT_CACHING_FABLEexists for cache-related debugging. - No indicator of a completed fallback is documented as of June 10. If your work is sensitive to which model answered, the raw API with the refusal branch is currently the only auditable path.
When to stop fighting and pin Opus
- Security or biology work: the docs say first-request triggers are routine. Run
/model opus(or pinclaude-opus-4-8in your harness) and keep half your money. - Pipelines that can’t tolerate a mid-run model swap (evals, reproducibility-sensitive runs): either use the refusal branch to fail loudly, or don’t use Fable.
- Batch workloads: no
fallbackssupport — but also no price premium, since Fable via Batch costs the same as interactive Opus. A refusal branch plus the 50% batch discount is the best combination for offline work. - Everything else: keep Fable, add the branch, log the rate, and let your own number decide.
What would change this page
A documented fallback indicator in Claude Code (the single most-requested fix in the launch discussion); the fallbacks parameter reaching GA or the missing surfaces; published per-domain trigger rates beyond the 5%/20.9%/2% spread; or changes to the classifier categories.
Companion reading
- Claude Fable 5 vs Opus 4.8: the launch-day verdict — pricing, benchmarks with the right labels, and the rest of the fine print
- Claude Code pricing, decoded — plans, usage weights, and which operations consume quota
- Managing long-running agents — the autonomous workflows where silent model swaps hurt most
- Your CLAUDE.md is an attack surface — the other side of agent-security: what your agent auto-trusts
Sources
- Anthropic docs — introducing Fable 5 and Mythos 5 (fallback design, categories)
- Anthropic docs — migration guide (refusal fields, fallbacks parameter, billing, surface support)
- Anthropic — Fable 5 & Mythos 5 system card (20.9% Terminal-Bench refusal rate, p.255)
- Claude Code docs — model configuration (trigger workloads, env vars)
- Artificial Analysis — Fable 5 (2% measured fallback on GDPval tasks)
- Hacker News — “If Claude Fable stops helping you, you’ll never know”