Claude Fable 5 API Guide (2026): Pricing, Setup & Code Examples

Q: What is the Claude Fable 5 API model ID?

The API model string is claude-fable-5. You pass it in the model field of a standard Anthropic Messages API request, the same way you would call claude-opus-4-8 or claude-sonnet-4-6.

Q: What is the Claude Fable 5 context window?

Claude Fable 5 ships with a 1 million token context window by default and supports up to 128,000 output tokens per request, which makes it suitable for whole-repo reasoning and long multi-step agent runs.

Published June 12, 2026 · 10 min read

Anthropic dropped Claude Fable 5 on June 9, 2026, and if you write code for a living, this is the one to pay attention to. It's the publicly released sibling of the Mythos 5 model that's been making security headlines for months. Same underlying weights, with the cybersecurity guardrails dialed up and an API you can actually call today.

Here's the short version: Fable 5 is built for autonomous, long-horizon work. Big agent runs, whole-repo refactors, the kind of task you used to babysit across five separate prompts. It's expensive, it's fast at the hard stuff, and you should not point it at every request. Let's get into the details that actually matter when you wire it up.

The Specs You Need Before Writing Code

The model ID is claude-fable-5. Drop that into the model field of a normal Messages API call and you're talking to it. No special endpoint, no separate SDK.

Spec	Claude Fable 5
Model ID	`claude-fable-5`
Context window	1,000,000 tokens (default)
Max output	128,000 tokens per request
Input price	~$10 / million tokens
Output price	~$50 / million tokens
Best for	Agentic coding, long-horizon reasoning
Reported SWE-Bench Pro	~80% (independent testing)

That 1M context window is the headline feature for coding work. You can feed it a genuinely large codebase, not just a handful of files, and it keeps the thread across a long agent loop. The 128k max output also means it can write a lot in a single turn without you stitching responses together.

Pricing Reality Check: Fable 5 vs Opus 4.8 vs GPT-5.5

This is where you need to be honest with yourself about what you're paying for. Fable 5 sits at roughly 2x the price of Opus 4.8 on both sides of the meter.

Model	Input ($/M)	Output ($/M)	Context
Claude Fable 5	~$10	~$50	1M
Claude Opus 4.8	~$5	~$25	1M
GPT-5.5	~$5	~$30	varies

So the question isn't "is Fable 5 good." It clearly is. The question is whether your task is hard enough to justify double the bill. For a one-shot bug fix or a routine commit message, it's overkill. For an autonomous agent that has to plan a multi-file migration, hold state across dozens of tool calls, and not lose the plot halfway through, the extra cost can be cheaper than a failed run you have to restart on a lesser model.

Rule of thumb: if a cheaper model finishes the task on the first try, use it. Reach for Fable 5 when the failure cost of a weaker model (retries, wasted tokens, broken output) is higher than the price gap.

Minimal curl Call

Here's the smallest request that gets you a real Fable 5 response:

curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-fable-5",
    "max_tokens": 2048,
    "messages": [
      {
        "role": "user",
        "content": "Plan a migration from Express to Fastify for a 40-route API. List the order of operations and risks."
      }
    ]
  }'

Notice max_tokens is set higher than you'd use for a chat model. With Fable 5 you're usually asking for substantial output (a plan, a diff, a full file), so don't choke it with a tiny limit.

Python: A Long-Context Agent Step

This is the pattern that justifies Fable 5's price. You're loading a large chunk of context and asking for a structured, multi-part answer in one shot.

import os
from anthropic import Anthropic

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

def plan_refactor(repo_context: str, goal: str) -> str:
    resp = client.messages.create(
        model="claude-fable-5",
        max_tokens=8000,
        system=[{
            "type": "text",
            "text": (
                "You are a senior staff engineer. Produce a concrete, "
                "ordered refactor plan with file-level changes and risk notes."
            ),
            "cache_control": {"type": "ephemeral"}
        }],
        messages=[{
            "role": "user",
            "content": f"Goal: {goal}\n\nRepository context:\n{repo_context}"
        }]
    )

    usage = getattr(resp, "usage", None)
    if usage:
        print("input_tokens:", getattr(usage, "input_tokens", 0))
        print("output_tokens:", getattr(usage, "output_tokens", 0))
        print("cache_read_input_tokens:", getattr(usage, "cache_read_input_tokens", 0))

    return resp.content[0].text

Two things to call out. First, the system block uses cache_control. When you run an agent loop that hits this endpoint repeatedly with the same instructions, prompt caching keeps the stable prefix from being re-billed at $10/M every single call. At Fable 5 prices, caching isn't a nice-to-have, it's the difference between a sane bill and a scary one. Second, always read the usage fields. At this price tier, flying blind on token counts is how teams get surprised at the end of the month.

Node.js: Streaming Long Output

When Fable 5 writes a big file or a long plan, you don't want to sit on a blocking call. Stream it.

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

async function generatePlan(taskText) {
  const stream = await client.messages.stream({
    model: "claude-fable-5",
    max_tokens: 8000,
    messages: [{
      role: "user",
      content: taskText
    }]
  });

  let full = "";
  for await (const event of stream) {
    if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
      process.stdout.write(event.delta.text);
      full += event.delta.text;
    }
  }
  return full;
}

Streaming also gives you an early bailout. If the model heads in the wrong direction in the first few hundred tokens, you can abort instead of paying for 8,000 output tokens of a bad answer.

When to Use Fable 5 (and When Not To)

Be deliberate. Fable 5 is a scalpel, not a default.

Use it for: autonomous coding agents, whole-repo analysis, multi-step migrations, long-horizon planning where context retention matters.
Skip it for: chat UIs, classification, summarization, routine commit messages, anything a Sonnet-tier model handles well.
Route around it: let a cheaper model do triage, then escalate only the hard tasks to Fable 5.

That last point is the real money-saver. A model router that sends easy work to a cheap model and reserves Fable 5 for the genuinely hard 10% of tasks can cut your bill dramatically while keeping output quality where it counts. If you'd rather not manage multiple provider keys to do that, an OpenAI-compatible gateway like KissAPI lets you reach Fable 5 alongside cheaper models through a single endpoint, so the routing logic lives in your code instead of your billing dashboard.

Estimating Your Bill Before You Commit

Before you flip Fable 5 on in production, do the math on a realistic workload. Take your average input and output token counts per request, multiply by your daily request volume, and apply the $10/$50 rates. At this tier the numbers move fast, so it pays to model it first. Our API cost calculator handles the arithmetic, and the token counter helps you size a request before you send it.

Try Claude Fable 5 Through One Endpoint

Create a free account at api.kissapi.ai/register and call Fable 5 alongside cheaper models with a single OpenAI-compatible key.

Start Free

Frequently Asked Questions

What is the Claude Fable 5 API model ID?

The API model string is claude-fable-5. You pass it in the model field of a standard Anthropic Messages API request, exactly like you would call claude-opus-4-8 or claude-sonnet-4-6.

How much does Claude Fable 5 cost per token?

Claude Fable 5 is priced at about $10 per million input tokens and $50 per million output tokens. That's roughly 2x Opus 4.8 on both sides, so it's best reserved for hard agentic and long-horizon coding work rather than every request.

What is the Claude Fable 5 context window?

Fable 5 ships with a 1 million token context window by default and supports up to 128,000 output tokens per request, which makes it suitable for whole-repo reasoning and long multi-step agent runs.