GPT-5.4 API Guide: Setup, Pricing & Code Examples (2026)

Published March 7, 2026 · 8 min read

OpenAI shipped GPT-5.4 on March 5th, and it's a big release. Three model variants, a 1 million token context window, native computer use, and a new "tool search" feature that changes how agents work. If you build anything on top of OpenAI's API, you need to understand what's different and how to use it.

This guide covers the practical stuff: what each variant does, what it costs, and how to actually call the API with working code. No hype, just the details you need to start building.

The Three Variants, Explained

GPT-5.4 isn't one model. It's three, and picking the wrong one will either cost you too much or give you worse results than you need.

GPT-5.4 (Base)

The general-purpose model. Fast, capable, and the cheapest of the three. This is what you want for most production workloads: chat, content generation, code completion, data extraction. Think of it as the successor to GPT-5.2 with better instruction following and a much larger context window.

GPT-5.4 Thinking

The reasoning variant. It takes longer to respond because it "thinks" before answering — similar to o1 and o3, but built on the 5.4 architecture. Use this for math, logic puzzles, multi-step planning, complex code generation, and anything where accuracy matters more than speed. Available to Plus, Team, and Pro users in ChatGPT, and through the API.

GPT-5.4 Pro

The high-performance variant optimized for enterprise workloads. Better at long-context tasks, more consistent outputs, and higher rate limits. Available through the API and for ChatGPT Enterprise and Edu subscribers. If you're processing legal documents, financial reports, or large codebases, this is the one.

Pricing Breakdown

Here's what each variant costs through OpenAI's API:

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context Window
GPT-5.4	$2.50	$10.00	1M tokens
GPT-5.4 Thinking	$5.00	$20.00	1M tokens
GPT-5.4 Pro	$10.00	$40.00	1M tokens

One important catch: once your input exceeds 272,000 tokens, the per-token cost doubles. So a 500K-token prompt on GPT-5.4 base costs $5.00/M for the portion above 272K. Keep this in mind if you're stuffing entire codebases into the context.

Cost comparison: GPT-5.4 base at $2.50/M input is actually cheaper than GPT-5.2 was at launch ($3.00/M). OpenAI is getting more aggressive on pricing, likely in response to Claude Sonnet 4.6 at $3.00/M and DeepSeek V3 undercutting everyone.

Quick Start: Your First API Call

If you've used the OpenAI API before, nothing changes structurally. Same endpoint, same SDK, new model name.

curl

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-5.4",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain the CAP theorem in three sentences."}
    ],
    "max_tokens": 500
  }'

Python (openai SDK)

from openai import OpenAI

client = OpenAI()  # reads OPENAI_API_KEY from env

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain the CAP theorem in three sentences."}
    ],
    max_tokens=500
)

print(response.choices[0].message.content)

Swap "gpt-5.4" for "gpt-5.4-thinking" or "gpt-5.4-pro" to use the other variants. Everything else stays the same.

Using GPT-5.4 Thinking Mode

The Thinking variant works like o3 — it reasons internally before producing output. The API response includes a reasoning field so you can see what the model considered.

response = client.chat.completions.create(
    model="gpt-5.4-thinking",
    messages=[
        {"role": "user", "content": "A farmer has 17 sheep. All but 9 die. How many are left?"}
    ],
    max_tokens=1000
)

# The model thinks before answering
print(response.choices[0].message.content)
# → "9 sheep are left."

Thinking mode is slower — expect 5-15 seconds for complex queries vs 1-3 seconds for base. But the accuracy improvement on reasoning tasks is significant. In my testing, it catches trick questions and multi-step logic problems that base GPT-5.4 still fumbles.

Native Computer Use

This is the headline feature. GPT-5.4 can control a computer: click buttons, type text, navigate applications, read screens. It's similar to what Anthropic shipped with Claude's computer use, but integrated directly into the model.

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "user", "content": "Open the calculator app and compute 847 * 293"}
    ],
    tools=[{
        "type": "computer_use",
        "display_width": 1920,
        "display_height": 1080
    }],
    max_tokens=2000
)

# The model returns tool_calls with screen actions
for call in response.choices[0].message.tool_calls:
    print(call.function.name, call.function.arguments)

In practice, you'll want to run this in a sandboxed environment. Giving an AI model direct access to your desktop is... bold. OpenAI recommends using their Codex sandbox or a VM for computer use tasks.

Tool Search: The Underrated Feature

Tool search might be the most interesting addition for agent builders. Instead of cramming every tool definition into your system prompt (which eats tokens), GPT-5.4 can dynamically retrieve tool definitions when it needs them.

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "user", "content": "Check the weather in Tokyo and book a flight"}
    ],
    tools=[...],  # your full tool catalog
    tool_search=True  # enable dynamic tool retrieval
)

With tool_search=True, the model doesn't load all tool definitions into context upfront. It scans them on demand, which can cut token usage by 30-50% if you have a large tool catalog. For agent frameworks with 20+ tools, this is a real cost saver.

The 1M Context Window

OpenAI's largest context window yet. For reference:

GPT-5.2 maxed out at 256K tokens
Claude Sonnet 4.6 offers 200K tokens
Gemini 3.1 Pro offers 1M tokens

So GPT-5.4 matches Gemini on context length. That's roughly 750,000 words, or about 3,000 pages of text. You can feed it an entire codebase, a full legal contract, or months of chat logs in a single prompt.

But remember the pricing catch: anything past 272K tokens costs double. A full 1M-token input on base GPT-5.4 would run about $4.32 — not cheap for a single request. Use it when you genuinely need it, not just because you can.

GPT-5.4 vs the Competition

How does it stack up against the other frontier models available right now?

Feature	GPT-5.4	Claude Sonnet 4.6	Gemini 3.1 Pro
Context Window	1M tokens	200K tokens	1M tokens
Input Price (per 1M)	$2.50	$3.00	$1.25
Output Price (per 1M)	$10.00	$15.00	$5.00
Computer Use	Native	Beta	No
Reasoning Mode	Thinking variant	Extended thinking	Thinking mode
Coding	Strong	Strongest	Strong

My take: GPT-5.4 is the best general-purpose model OpenAI has shipped. The computer use feature is genuinely new territory for them. But Claude still edges it out on coding tasks, and Gemini is cheaper if you need long context. The right choice depends on your use case.

Using GPT-5.4 Through a Gateway

OpenAI's API works globally, but there are reasons to use a gateway instead of going direct: unified billing across multiple providers, better rate limits, automatic failover, and sometimes lower prices.

If you're already using an OpenAI-compatible gateway, switching to GPT-5.4 is just a model name change:

from openai import OpenAI

# Point to your gateway
client = OpenAI(
    api_key="your-gateway-key",
    base_url="https://api.kissapi.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "user", "content": "Write a Python function to merge two sorted lists."}
    ]
)

print(response.choices[0].message.content)

The advantage of a gateway like KissAPI is that you can switch between GPT-5.4, Claude, Gemini, and DeepSeek without changing your code — just swap the model name. One API key, one billing dashboard, all the models.

Practical Tips

Start with base GPT-5.4. It's the cheapest and fastest. Only upgrade to Thinking or Pro when you hit a quality ceiling.
Watch the 272K boundary. If your prompts regularly exceed it, consider chunking your input or using a summarization step first.
Enable tool search for agents. If you have more than 10 tools defined, tool_search=True will save you real money.
Sandbox computer use. Don't give it access to your actual desktop. Use a VM or container.
Test Thinking mode on your specific tasks. It's not universally better — for simple tasks, it's just slower and more expensive. But for anything requiring multi-step reasoning, the accuracy jump is worth it.

Access GPT-5.4 Through KissAPI

All three GPT-5.4 variants available now. Plus Claude, Gemini, DeepSeek, and 200+ models through one API key. Pay-as-you-go, no minimums.

Start Free →

Bottom Line

GPT-5.4 is a solid release. The 1M context window finally matches Gemini, computer use opens up new agent possibilities, and tool search is a quiet game-changer for anyone building complex AI systems. The pricing is competitive — actually cheaper than GPT-5.2 on input tokens.

Whether it's worth switching from Claude or Gemini depends on what you're building. For general-purpose work and computer automation, GPT-5.4 is now the strongest option from OpenAI. For pure coding, Claude still has the edge. For cost-sensitive long-context work, Gemini wins on price.

The API is live now. Go build something.