Sakana Fugu API Access Guide (2026): Orchestration Model Setup & Code
On June 22, 2026, Sakana AI shipped something that doesn't fit the usual "bigger model, higher benchmark" story. They launched Sakana Fugu and Fugu Ultra — not another monolithic frontier model, but an orchestration model. Fugu is itself an LLM trained to call other LLMs. You send one request, and it decides whether to answer directly or assemble a team of expert models to do the work.
If you've ever hand-built a multi-agent pipeline — a planner here, a coder there, a verifier at the end — you know how much glue code that takes. Fugu's pitch is that the orchestration logic now lives inside the model, and you talk to it through a single OpenAI-compatible endpoint. That's worth understanding even if you don't adopt it tomorrow, because it changes how you think about routing.
What "Orchestration Model" Actually Means
Most routing today is something you write. You classify a request, pick a model, maybe fan out to two and merge. Fugu moves that decision into a trained model. From the outside it looks like one API call. On the inside, it's picking models from a pool, delegating subtasks, verifying, and synthesizing a final answer.
The architecture builds on two ICLR 2026 papers from Sakana — TRINITY (an evolved coordinator that assigns Thinker / Worker / Verifier roles) and the Conductor (an RL-trained model that learns natural-language coordination strategies). The practical upshot: you stop maintaining a brittle if/else router and let the model handle assembly.
There's a strategic angle too. Sakana frames Fugu as a hedge against single-vendor risk. If one provider restricts access — export controls, regional rules, an outage — the orchestrator routes around it. That's the same instinct a lot of teams already have when they keep a backup endpoint warm.
Fugu vs Fugu Ultra: Pick the Right One
There are two models at launch, both behind the same API. Don't overthink it:
| Model | Best for | Trade-off |
|---|---|---|
| Fugu | Everyday coding, code review, chatbots, interactive services | Lower latency, strong but not maxed-out quality |
| Fugu Ultra | Hard multi-step work: research, paper reproduction, security analysis, deep investigations | Higher quality, deeper agent pool, more latency and cost |
On Sakana's own published benchmarks, Fugu Ultra sits shoulder-to-shoulder with top frontier models — 73.7 on SWE-Bench Pro, 82.1 on TerminalBench 2.1, 95.5 on GPQA-D — and beats the individual models it orchestrates on several of them. The interesting claim isn't "we have the smartest model." It's "coordinating several strong models can beat any one of them." For agentic, long-running tasks, that's plausible.
Calling Fugu: It's an OpenAI-Compatible Endpoint
The nice part for developers: Fugu speaks the OpenAI chat-completions format. If you've written an OpenAI client, you've basically already written a Fugu client. Swap the base URL, the API key, and the model name.
curl
curl https://api.sakana.ai/v1/chat/completions \
-H "Authorization: Bearer $SAKANA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "fugu",
"messages": [
{"role": "system", "content": "You are a senior backend engineer."},
{"role": "user", "content": "Refactor this function for readability and add tests."}
]
}'
Switch "model": "fugu" to "model": "fugu-ultra" when you want the deeper pool. Same request shape, different effort tier. Always confirm exact model identifiers and the base URL against Sakana's console docs, since naming can shift right after launch.
Python (openai SDK)
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["SAKANA_API_KEY"],
base_url="https://api.sakana.ai/v1",
)
resp = client.chat.completions.create(
model="fugu-ultra",
messages=[
{"role": "system", "content": "You are a careful code reviewer."},
{"role": "user", "content": "Review this diff and rank issues by severity."},
],
)
print(resp.choices[0].message.content)
Because it's the standard SDK, your retries, streaming, and logging middleware carry over unchanged. That's the whole point of an OpenAI-compatible surface — no new client to babysit.
Node.js
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.SAKANA_API_KEY,
baseURL: "https://api.sakana.ai/v1",
});
const res = await client.chat.completions.create({
model: "fugu",
messages: [
{ role: "system", content: "You are a precise assistant." },
{ role: "user", content: "Summarize this RFC and list open questions." },
],
});
console.log(res.choices[0].message.content);
Where Fugu Fits (and Where It Doesn't)
Be honest about the trade-offs before you wire it into production.
- Good fit: messy, long-running, multi-step tasks. Code review that needs to actually find bugs. Research loops. Anything where a single model call tends to miss things and you'd otherwise build an agent yourself.
- Weak fit: tight-latency, high-QPS endpoints where you need a sub-second response and predictable cost per call. Orchestration adds steps, and steps add time and tokens.
- Watch the cost: an orchestrated answer can fan out to several underlying models. The quality can be worth it, but you should meter spend per endpoint, not assume one flat per-token rate.
One more practical note: Fugu lets you opt specific providers or models out of its pool for data, privacy, or compliance reasons. If your org bans a given vendor, you can exclude it instead of dropping the whole approach.
The Bigger Pattern: Don't Marry One Endpoint
Whether or not you adopt Fugu, the launch underlines a trend that's been building all year: teams want frontier capability without betting the whole stack on one provider. You can get that two ways — let a model like Fugu orchestrate for you, or keep your own routing layer with a clean fallback path.
Plenty of teams already do the second version. They run their primary model directly, then keep an OpenAI-compatible secondary route ready for spikes or regional issues. That's exactly the kind of setup KissAPI exists for — one endpoint that fronts Claude, GPT-5, Gemini, and more, so swapping or falling back between models is a config change, not a rewrite. Different mechanism than Fugu, same goal: don't let a single dependency become a single point of failure.
If you want to ballpark what any of this costs before you commit, run your token volumes through a calculator first. Orchestration and multi-model routing both make per-request spend harder to eyeball, so estimate it up front instead of finding out on the invoice.
Keep a Multi-Model Fallback Ready
Create a free account at kissapi.ai/register and route Claude, GPT-5, Gemini and more through one OpenAI-compatible endpoint — no rewrite when you switch models.
Start FreeFrequently Asked Questions
What is Sakana Fugu and how is it different from a normal LLM?
Sakana Fugu, launched June 22, 2026, is an orchestration model: a language model trained to call other LLMs in an agent pool rather than answering everything itself. You hit one OpenAI-compatible endpoint, and Fugu decides whether to answer directly or assemble a team of expert models for a complex, multi-step task.
What's the difference between Fugu and Fugu Ultra?
Fugu balances performance and latency and is the better default for everyday coding, code review, and interactive chat. Fugu Ultra coordinates a deeper pool of expert agents to maximize answer quality on hard, multi-step problems like research and security analysis, at the cost of higher latency.
Is Sakana Fugu available in the EU?
At launch Fugu is not available in the EU/EEA while Sakana AI works toward GDPR and EU-specific compliance. Both Fugu and Fugu Ultra are accessible elsewhere through a single OpenAI-compatible API, with subscription and pay-as-you-go tiers.