How do I know if Claude Fable 5 fell back to Opus 4.8?

On the raw API, a blocked request returns HTTP 200 with stop_reason "refusal" and a stop_details.category (cyber, bio, or reasoning_extraction) — check those fields. With the opt-in fallbacks parameter (beta), the response reports which model answered. In Claude Code and other client apps, the fallback is automatic and no indicator is documented as of June 10, 2026.

How often does Fable 5 trigger a safety fallback?

Anthropic says fewer than 5% of sessions on average. Its own Terminal-Bench run hit safety refusals in 20.9% of trials, and Artificial Analysis measured fallbacks on 2% of its benchmark tasks. Security and biology workloads trigger far more often — per the official docs, often on the first request. Log your own rate; it varies by codebase.

Does a refused Fable 5 request cost money?

A request refused before any output is generated isn't billed. If a classifier fires mid-stream, you pay only for what was already generated, and a fallback credit refunds the prompt-cache cost of switching models.

Should security researchers use Fable 5 at all?

Generally no. Penetration testing, CTF exercises, and biology-adjacent codebases trigger the classifiers frequently — often on the first request, per Anthropic's docs — so you'd pay Fable's 2× price and receive Opus 4.8 answers. Pin Opus 4.8 directly for that work.

Claude Fable 5 Fallbacks: How to Detect the Silent Switch to Opus 4.8 and Handle Refusals in Agent Code

Claude Fable 5 ships with safety classifiers watching for offensive-security, dangerous bio/chem, and capability-extraction requests. When one triggers, you don’t get an error. In client apps like Claude Code, your request is quietly re-run on Opus 4.8. On the raw API, you get a successful HTTP 200 response that contains a refusal. Anthropic says this touches fewer than 5% of sessions on average — but its own benchmark run hit it in 20.9% of trials, and the discussion that followed launch made the real problem clear: by default, nothing tells you it happened.

This page is the practical companion to our Fable 5 verdict: every behavior documented, every detection hook, and the decision rules. Sources at the end.

Update — July 1, 2026: Fable 5 is back — and this guide matters more than before. After an 18-day suspension under a US export-control directive, Anthropic restored Fable 5 and Mythos 5 on July 1 behind a new safety classifier aimed at the reported jailbreak — one that, per Anthropic, redirects that technique to Opus 4.8 “in over 99% of cases.” That is exactly the silent switch this page is about, now with a named trigger: the flagged technique is asking the model to read a codebase and fix its flaws — the cyber-category work covered below. The detection branches here apply directly. (During the June 12–30 outage there was nothing silent to detect — the API returned a hard 404; that’s over.) Full timeline and what changed: Fable 5 suspended, then restored.

Update (June 11): The backlash worked. Anthropic told Wired it is making these safeguards visible, and apologized: “We made the wrong tradeoff and we apologize for not getting the balance right.” Three changes roll out starting this week. A previously undisclosed category — requests targeting frontier LLM development, which the system card had handled by quietly degrading answers rather than refusing — joins the visible-fallback path. Client surfaces will say so when a request falls back to Opus 4.8. And API refusals will carry an explicit reason, with server-side fallback support following within days. Everything below stays accurate until those changes reach your logs: keep the detection branch in place and verify against your own stop_details once they ship. Simon Willison’s counterpoint is worth reading too — visibility isn’t the fix he wanted; he argues the category should be dropped entirely.

What actually happens, per surface

Surface	On classifier trigger	Can you tell?
Claude Code (and client apps)	Request automatically re-runs on Opus 4.8; the session continues	No built-in indicator is documented as of June 10
Raw Messages API	Request is blocked: HTTP 200 with `stop_reason: "refusal"`	Yes — check the response fields below
Raw API with `fallbacks` parameter (beta)	Automatically retried on the model you list	Yes — the response says which model answered

The asymmetry is the trap. API users can detect every refusal; app users get the silent swap. If you’re paying 2× for Fable on work where the difference matters, the app default means you can’t audit what you actually got.

Detecting refusals in your own agent code

A blocked request is a normal response, not an exception. Three fields matter:

stop_reason comes back as "refusal" instead of end_turn or tool_use.
stop_details.category names which classifier fired: "cyber", "bio", or "reasoning_extraction".
The response is HTTP 200 — retry logic keyed on status codes will never see it.

The branch every Fable-bound agent loop needs:

response = client.messages.create(model="claude-fable-5", ...)

if response.stop_reason == "refusal":
    category = response.stop_details.category if response.stop_details else None
    log_fallback(task_id, category)          # count these — see below
    response = client.messages.create(model="claude-opus-4-8", ...)  # your own fallback

Loops that only branch on end_turn and tool_use don’t crash on a refusal — they stop making progress with no visible error, usually mid-pipeline. Add the branch before switching models, not after the first silent stall.

Count your fallback rate. The published numbers span 5% (Anthropic’s session average) to 20.9% (its own Terminal-Bench trials) to 2% (Artificial Analysis’s independent task measurements). Your codebase has its own number, and it decides whether Fable is worth paying for on that work — one log line per refusal tells you within a week.

The opt-in `fallbacks` parameter

The API can do the retry for you: an opt-in fallbacks parameter (beta) lists the model to switch to when a classifier fires, and the response reports which model actually answered — automatic and auditable, unlike the client-app behavior.

Where it doesn’t work, per the migration guide: the Message Batches API, Amazon Bedrock, Vertex AI, and Microsoft Foundry. On those surfaces the manual branch above is the only option. (It works on the Claude API and Claude Platform on AWS.)

Billing is fair in both directions: a request refused before any output isn’t billed; if a classifier fires mid-stream you pay only for what was generated, and a “fallback credit” refunds the prompt-cache cost of the model switch.

In Claude Code

The official docs are direct about who hits this: penetration testing, CTF exercises, and biology-adjacent codebases trigger fallback “often on the first request.” Security work on Fable is effectively Opus work at Fable prices.
The fallback target is configurable: ANTHROPIC_DEFAULT_FABLE_MODEL changes which model Claude Code re-runs on (default: Opus 4.8 on the Anthropic API). DISABLE_PROMPT_CACHING_FABLE exists for cache-related debugging.
No indicator of a completed fallback is documented as of June 10. If your work is sensitive to which model answered, the raw API with the refusal branch is currently the only auditable path.

When to stop fighting and pin Opus

Security or biology work: the docs say first-request triggers are routine. Run /model opus (or pin claude-opus-4-8 in your harness) and keep half your money.
Pipelines that can’t tolerate a mid-run model swap (evals, reproducibility-sensitive runs): either use the refusal branch to fail loudly, or don’t use Fable.
Batch workloads: no fallbacks support — but also no price premium, since Fable via Batch costs the same as interactive Opus. A refusal branch plus the 50% batch discount is the best combination for offline work.
Everything else: keep Fable, add the branch, log the rate, and let your own number decide.

What would change this page

A documented fallback indicator in Claude Code (the single most-requested fix in the launch discussion — promised on June 11, see the update at the top; this page updates again when it ships); the fallbacks parameter reaching GA or the missing surfaces; published per-domain trigger rates beyond the 5%/20.9%/2% spread; or changes to the classifier categories.

Companion reading

Claude Fable 5 vs Opus 4.8: the launch-day verdict — pricing, benchmarks with the right labels, and the rest of the fine print
Claude Code pricing, decoded — plans, usage weights, and which operations consume quota
Managing long-running agents — the autonomous workflows where silent model swaps hurt most
Your CLAUDE.md is an attack surface — the other side of agent-security: what your agent auto-trusts