Claude Sonnet 5 API Migration Guide (2026): 1M Context, Adaptive Thinking, and Pricing
Anthropic announced Claude Sonnet 5 on July 2, 2026, and the news matters for developers because this isn't just another model name in a dropdown. Sonnet 5 is now available on the Claude API and Claude Code, ships with a native 1M-token context window, turns adaptive thinking on by default, and launches with temporary pricing of $2 per million input tokens and $10 per million output tokens through August 31, 2026.
That's attractive. It's also the kind of launch that can quietly break production if you treat it like a pure string replacement. Sonnet 5 removes manual extended thinking, rejects non-default sampling parameters, and uses a newer tokenizer that can produce roughly 30% more tokens for the same text. If you run agents, code review bots, support automation, or long-context retrieval workflows, you should migrate deliberately.
This guide is the practical version: what changed, what code to update, what to measure, and where teams are likely to waste money.
The short version: what changed?
| Area | Claude Sonnet 5 behavior | Developer impact |
|---|---|---|
| Model ID | claude-sonnet-5 | Update your model string and config allowlists. |
| Context | 1M tokens by default | Long-codebase and document workflows get more room. |
| Output | Up to 128k tokens | Raise max_tokens only when you really need it. |
| Thinking | Adaptive thinking on by default | Budget for hidden reasoning inside the output limit. |
| Sampling | Non-default temperature, top_p, top_k return 400 | Remove these parameters from migration requests. |
| Tokenizer | Same text may become ~30% more tokens | Recount prompts and revisit cost budgets. |
| Pricing | $2/$10 per MTok until Aug 31; then $3/$15 | Good launch window, but don't ignore tokenizer inflation. |
Step 1: change the model ID, but don't stop there
The minimum change is simple:
model = "claude-sonnet-4-6" # before
model = "claude-sonnet-5" # after
That part takes five seconds. The migration work is everything around it: gateway routing, request validators, test fixtures, cost dashboards, retry logic, and any code that assumes a smaller context window. If your app has a hardcoded model allowlist, update it in the same pull request. If your billing dashboard groups models by prefix, make sure Sonnet 5 doesn't fall into an "unknown" bucket.
If you use an OpenAI-compatible gateway such as KissAPI to route across Claude, GPT, Gemini, and other models, add the new model as a separate route first. Don't overwrite your Sonnet 4.6 production route until you have real usage data.
Step 2: remove unsupported sampling parameters
This is the migration bug I expect to see most often. Many teams set temperature by habit. On Claude Sonnet 5, setting temperature, top_p, or top_k to non-default values returns a 400 error.
Bad migration request:
curl https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{
"model": "claude-sonnet-5",
"max_tokens": 1200,
"temperature": 0.2,
"messages": [{"role": "user", "content": "Review this patch."}]
}'
Safer request:
curl https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{
"model": "claude-sonnet-5",
"max_tokens": 1200,
"system": "Be concise. Prefer deterministic, production-safe answers.",
"messages": [{"role": "user", "content": "Review this patch."}]
}'
Use the system prompt to guide style and consistency. Don't fight the API.
Step 3: migrate manual extended thinking to adaptive thinking
Sonnet 5 runs with adaptive thinking by default. Manual extended thinking with thinking: {"type":"enabled", "budget_tokens": N} is gone and returns a 400. If your current wrapper injects a thinking budget for every Claude request, update it before switching the model ID.
# Not supported on Claude Sonnet 5
thinking = {"type": "enabled", "budget_tokens": 32000}
# Use adaptive thinking instead
thinking = {"type": "adaptive"}
There's one practical catch: max_tokens is a hard limit for total output, including thinking plus final answer text. If you previously used max_tokens: 1000 for short answers, Sonnet 5 may have less room for visible output than you expect. Test with real prompts, not toy prompts.
Python migration wrapper
Here's a small wrapper that strips the risky fields and logs usage so you can compare Sonnet 4.6 and Sonnet 5 before a full cutover.
import os
from anthropic import Anthropic
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
UNSUPPORTED_ON_SONNET_5 = {"temperature", "top_p", "top_k"}
def ask_sonnet5(messages, system=None, max_tokens=2000):
kwargs = {
"model": "claude-sonnet-5",
"max_tokens": max_tokens,
"messages": messages,
}
if system:
kwargs["system"] = system
# Optional: be explicit. Omitting thinking also enables adaptive thinking.
kwargs["thinking"] = {"type": "adaptive"}
resp = client.messages.create(**kwargs)
usage = getattr(resp, "usage", None)
if usage:
print({
"input_tokens": getattr(usage, "input_tokens", None),
"output_tokens": getattr(usage, "output_tokens", None),
"cache_read_input_tokens": getattr(usage, "cache_read_input_tokens", 0),
})
return resp.content[0].text
result = ask_sonnet5(
messages=[{"role": "user", "content": "Find risks in this deployment plan."}],
system="You are a pragmatic senior engineer. Return blockers first.",
max_tokens=2500,
)
print(result)
Node.js example for a coding agent route
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
export async function reviewWithSonnet5(diff) {
const response = await client.messages.create({
model: "claude-sonnet-5",
max_tokens: 3000,
thinking: { type: "adaptive" },
system: [
{
type: "text",
text: "You are a senior code reviewer. Prioritize security, data loss, and backwards compatibility."
}
],
messages: [
{ role: "user", content: `Review this diff:\n\n${diff}` }
]
});
return {
text: response.content[0].text,
usage: response.usage
};
}
Cost check: the promotional price is not the whole story
The launch price looks great: $2 per million input tokens and $10 per million output tokens until August 31, then $3/$15. But the tokenizer change matters. If the same codebase prompt becomes 30% more tokens, your real request cost may not fall as much as the pricing table suggests.
Before moving all traffic, run your top 20 production prompts through token counting for both Sonnet 4.6 and Sonnet 5. Then compare actual input tokens, output length, latency, refusal rate, and task success.
For teams running multi-model gateways, this is a perfect place to use a cost calculator instead of guessing. Long-context requests might still belong on Sonnet 5. Small deterministic extraction tasks may belong on a cheaper model. That's not a downgrade; it's routing discipline.
A practical rollout plan
- Create a new route: Add
claude-sonnet-5beside your existing Sonnet 4.6 route. - Run shadow tests: Replay recent prompts without sending Sonnet 5 output to users.
- Remove invalid parameters: Strip non-default sampling fields and old extended-thinking budgets.
- Recount tokens: Pay special attention to code, JSON, logs, and non-English text.
- Raise output limits selectively: Don't blindly set 128k. That's how bills get weird.
- Canary by endpoint: Start with code review, research, or internal agent workflows before customer-facing automation.
- Keep fallback routing: If Sonnet 5 refuses a category of cyber-adjacent task or hits limits, route safely instead of failing hard.
KissAPI can help here if you want one OpenAI-compatible endpoint for model routing, fallback, and quick A/B tests across providers. Use it as an operational layer, not a magic wand: you still need good evals and sane token budgets.
When should you switch?
My take: switch early for coding agents, long-context document analysis, and multi-step automation where Sonnet 4.6 was getting stuck. Wait and test for highly controlled JSON extraction, support classifiers, and workloads with tight cost ceilings. Sonnet 5 is stronger, but stronger models are not automatically cheaper in production.
The best migration is boring: a new route, a canary, a dashboard, and a rollback button. Do that, and Sonnet 5 is a clean upgrade rather than a Friday incident.
Test Sonnet 5 Without Rebuilding Your Stack
Create a free KissAPI account, route models through one OpenAI-compatible API, and compare cost, latency, and output quality before you move production traffic.
Start FreeFAQ
What is the Claude Sonnet 5 API model ID?
The model ID is claude-sonnet-5. Treat it as a new production route first, then migrate traffic after testing token counts and behavior changes.
Does Claude Sonnet 5 support 1M context?
Yes. Anthropic says Sonnet 5 has a 1M token context window by default and supports up to 128k output tokens.
Is Claude Sonnet 5 cheaper than Sonnet 4.6?
During the introductory period, yes on per-token pricing: $2/$10 per million input/output tokens through August 31, 2026. After that it moves to $3/$15, the same standard rate as Sonnet 4.6. But the new tokenizer can produce more tokens for the same text, so measure your workload.