How to Use GLM-5.1 with Claude Code in 2026: Gateway Setup Guide
GLM-5.1 is getting real attention from developers, and for good reason. It is strong enough on coding work that people immediately want it inside Claude Code. The problem is that a lot of setup advice on social posts is just wrong. Claude Code is not an OpenAI client with a different logo. If you point it at a plain /v1/chat/completions endpoint and hope for the best, you will waste an afternoon.
The setup that actually works is simpler than the rumors make it sound. Claude Code talks in Anthropic's Messages format. So to use GLM-5.1, you need a provider or gateway that accepts Anthropic-style requests, forwards the right headers, and maps Claude Code's internal model choices to glm-5.1.
This guide shows the clean path: verify the gateway first, set the right environment variables, remap Claude Code's default models, then test with a tiny prompt before you trust it with your repo.
Why this topic matters right now
There are two reasons this keyword matters in April 2026.
- Claude Code now openly documents third-party providers and LLM gateways. That means this workflow is no longer some weird community hack.
- Z.ai's own docs recently published a GLM-5.1 coding-agent guide, including Claude Code model remapping. That turned a niche setup into something a lot more people are trying this week.
So yes, “use GLM-5.1 with Claude Code 2026” is a real search problem, not SEO filler. Developers want a setup that works today, not theory.
What has to be true for Claude Code + GLM-5.1 to work
| Component | What it needs to do |
|---|---|
| Claude Code | Send Anthropic Messages requests such as /v1/messages |
| Gateway or provider | Accept Anthropic-style auth and request bodies, preserve headers like anthropic-version, and ideally expose /v1/messages/count_tokens |
| Upstream model | Route requests to glm-5.1 when Claude Code asks for Sonnet or Opus aliases |
This is the part people skip. If your provider only gives you an OpenAI-compatible endpoint, Claude Code will not talk to it directly. You need an Anthropic-compatible layer in front. That's the whole game.
Short version: Claude Code can use GLM-5.1, but only through an Anthropic-compatible gateway or provider. If the gateway only speaks OpenAI format, use Cursor, Aider, Codex CLI, or another tool instead.
Step 1: Test the gateway before you touch Claude Code
Do not start in the editor. Start with one direct HTTP request. If this fails, Claude Code will fail too.
curl test
export ANTHROPIC_BASE_URL="https://your-gateway.example.com"
export ANTHROPIC_AUTH_TOKEN="your-gateway-token"
curl "$ANTHROPIC_BASE_URL/v1/messages" \
-H "content-type: application/json" \
-H "anthropic-version: 2023-06-01" \
-H "authorization: Bearer $ANTHROPIC_AUTH_TOKEN" \
-d '{
"model": "glm-5.1",
"max_tokens": 256,
"messages": [
{"role": "user", "content": "Reply with exactly: ready"}
]
}'
If you get a valid response, good. If you get 401, 403, or 404, stop there and fix it. Don't drag Claude Code into a problem that is still just HTTP.
One nuance here: some gateways want bearer auth, which is why Claude Code documents ANTHROPIC_AUTH_TOKEN for LLM gateways. Others expect Anthropic-style API keys. If your provider uses x-api-key instead, you can use ANTHROPIC_API_KEY in Claude Code and swap the header in your curl test.
Step 2: Point Claude Code at the gateway
Once the raw request works, Claude Code setup is mostly environment variables.
export ANTHROPIC_BASE_URL="https://your-gateway.example.com"
export ANTHROPIC_AUTH_TOKEN="your-gateway-token"
claude
If your gateway uses Anthropic-style keys instead of bearer tokens, use this instead:
export ANTHROPIC_BASE_URL="https://your-gateway.example.com"
export ANTHROPIC_API_KEY="your-api-key"
claude
That gets Claude Code to the right server. It does not yet make Claude Code pick glm-5.1. For that, you need model remapping.
Step 3: Remap Claude Code's default models to GLM-5.1
Claude Code internally thinks in terms of Sonnet, Opus, and Haiku. The practical way to steer it toward GLM-5.1 is to override those defaults in ~/.claude/settings.json.
{
"env": {
"ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-5.1",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-5.1",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.5-air"
}
}
This is the same pattern recent GLM coding-agent docs recommend. It works because Claude Code still asks for its internal model roles, and the gateway or provider resolves those names to the upstream model you want.
If you want GLM-5.1 for almost everything, map both Sonnet and Opus to glm-5.1. If you want cheaper fallbacks for small tasks, leave the Haiku slot mapped to a lighter model.
Step 4: Run Claude Code, then start with a boring prompt
Launch claude, run /status, and open a real repo. But do not begin with a giant agent task. Your first prompt should be dull on purpose:
Read these files, explain the main bug in five bullets, and do not edit anything yet.
If that works, move on to a small edit. If that works too, then trust it with bigger refactors. The mistake most people make is giving a fresh setup a mission-critical task before they even know if routing, streaming, and token counting are stable.
Python test for the same gateway
Testing outside Claude Code is still useful, especially when you want something you can paste into CI or a notebook.
import os
import requests
url = os.environ["ANTHROPIC_BASE_URL"].rstrip("/") + "/v1/messages"
headers = {
"content-type": "application/json",
"anthropic-version": "2023-06-01",
"authorization": f"Bearer {os.environ['ANTHROPIC_AUTH_TOKEN']}"
}
payload = {
"model": "glm-5.1",
"max_tokens": 256,
"messages": [
{"role": "user", "content": "Explain why rate limiting matters for coding agents."}
]
}
resp = requests.post(url, headers=headers, json=payload, timeout=60)
resp.raise_for_status()
print(resp.json()["content"][0]["text"])
Node.js test
const url = `${process.env.ANTHROPIC_BASE_URL.replace(/\/$/, "")}/v1/messages`;
const resp = await fetch(url, {
method: "POST",
headers: {
"content-type": "application/json",
"anthropic-version": "2023-06-01",
"authorization": `Bearer ${process.env.ANTHROPIC_AUTH_TOKEN}`
},
body: JSON.stringify({
model: "glm-5.1",
max_tokens: 256,
messages: [
{ role: "user", content: "Explain why rate limiting matters for coding agents." }
]
})
});
if (!resp.ok) {
throw new Error(`${resp.status} ${await resp.text()}`);
}
const data = await resp.json();
console.log(data.content[0].text);
The mistakes that break this setup most often
1. Using an OpenAI endpoint with Claude Code
This is the big one. Claude Code expects Anthropic Messages format. If your provider only offers /v1/chat/completions, this setup is dead on arrival.
2. Wrong auth variable
If the gateway expects bearer auth, use ANTHROPIC_AUTH_TOKEN. If it expects Anthropic-style keys, use ANTHROPIC_API_KEY. Mixing them is an easy way to manufacture a fake mystery.
3. Forgetting model remapping
You can point Claude Code at the gateway correctly and still never hit glm-5.1 if you do not override the default model names.
4. Missing /count_tokens or dropped Anthropic headers
Claude Code's gateway docs are pretty clear here. If the proxy drops anthropic-version, betas, or token-counting support, behavior gets flaky fast.
5. Using GLM-5.1 for every tiny prompt
That is not a setup issue. That is a judgment issue. Stronger models are great for ugly debugging, planning, and multi-file edits. They are not the best answer to every throwaway question.
How I'd actually use GLM-5.1 in Claude Code
I would use it for the turns where weaker models lose the thread:
- multi-file refactors
- debugging from logs plus code plus tests
- planning larger changes before editing
- fixes where you care more about first-pass quality than raw speed
I would not burn it on trivial file renames, tiny formatting fixes, or “what does this function do?” prompts. That's how people turn a good model into an expensive habit.
If you already juggle Claude, GPT, and a few newer models, a unified gateway pattern matters more than people admit. It keeps Claude Code, scripts, and billing from turning into vendor spaghetti. KissAPI is useful in that broader sense: one endpoint for the models you actually use every day, instead of six different configs rotting in your shell history.
Want a cleaner multi-model setup?
Keep your coding workflow behind one API endpoint, switch models when the task changes, and stop rebuilding the same config stack every week.
Start FreeFinal thought
GLM-5.1 with Claude Code is real, but only if you respect the protocol layer. Verify the Anthropic-compatible gateway first, remap Claude Code's model defaults second, then test with a boring prompt before you let it touch a live repo. Do that, and the setup is clean. Skip it, and you will blame the wrong thing for hours.