GPT-5.3-Codex API Guide (2026): Pricing, Setup & Code Examples

OpenAI now ships two different "good at code" models, and people keep wiring up the wrong one. If you're paying GPT-5.5 rates to run a coding agent that fires 200 turns a day, you're lighting money on fire. The dedicated Codex model exists exactly for that workload, and it's a lot cheaper per token.

This guide covers GPT-5.3-Codex: what it is, how it's priced, when to reach for it over GPT-5.5, and how to call it from curl, Python, and Node.js. No fluff, just the stuff you need to ship.

What GPT-5.3-Codex Actually Is

GPT-5.3-Codex is OpenAI's dedicated API model for long-horizon agentic coding. Think of it as the engine behind the Codex CLI and similar agent loops: read files, plan, edit, run tests, read the failure, fix, repeat. It's tuned to stay coherent across many tool calls without drifting, and it tends to be more disciplined about following an existing codebase's conventions than the general flagship.

The important mental model: gpt-5.5 is the smart generalist, gpt-5.3-codex is the specialist that runs cheap and long. They share the same OpenAI-compatible API shape, so switching is a one-line change.

Pricing: Why This Choice Matters

Here's the part that changes your bill. Codex output tokens are where coding agents spend most of their money, and that's exactly where GPT-5.3-Codex is cheaper.

ModelInput / 1MCached Input / 1MOutput / 1M
gpt-5.3-codex$1.75$0.175$14.00
gpt-5.5$5.00$0.50$30.00
gpt-5.4$2.50$0.25$15.00

Look at output: $14 vs $30. Agent loops generate a ton of output, including diffs, tool arguments, reasoning steps, and retries. On a workload that's output-heavy, switching the agent's worker model from GPT-5.5 to GPT-5.3-Codex can cut spend by more than half without changing your code quality much.

Rule of thumb: if a model is doing repeated edit-test-fix turns inside an agent, default to GPT-5.3-Codex. Save GPT-5.5 for the planning step or one-shot questions where raw reasoning matters more than cost.

Minimal curl Example

GPT-5.3-Codex speaks the standard OpenAI-compatible API. Here's a bare request through the Chat Completions endpoint:

curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "gpt-5.3-codex",
    "messages": [
      {"role": "system", "content": "You are a careful senior engineer. Make minimal, correct edits and explain risky changes."},
      {"role": "user", "content": "Refactor this function to use async/await and add error handling:\n\nfunction load(cb){ fs.readFile(\"x.json\", cb); }"}
    ]
  }'

Same headers, same body shape you already use. The only thing that changes is the model field.

Python: A Practical Coding Worker

This is the pattern I actually use for an agent worker. Keep the system prompt stable so cached input kicks in, and keep edits scoped.

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

SYSTEM = """You are a coding agent worker.
Rules:
1) Return a unified diff only, no prose, unless asked.
2) Touch the fewest lines possible.
3) Never invent file paths or APIs you haven't seen.
""".strip()


def edit_code(instruction: str, file_context: str) -> str:
    resp = client.chat.completions.create(
        model="gpt-5.3-codex",
        messages=[
            {"role": "system", "content": SYSTEM},
            {"role": "user", "content": f"{instruction}\n\nFile:\n{file_context}"},
        ],
    )
    usage = resp.usage
    print("in:", usage.prompt_tokens, "out:", usage.completion_tokens)
    return resp.choices[0].message.content


if __name__ == "__main__":
    diff = edit_code(
        "Add input validation that rejects negative amounts.",
        "def charge(amount):\n    return gateway.run(amount)",
    )
    print(diff)

Notice the print on usage. If you're running agents, log token counts per turn from day one. It's the only way to catch a retry loop before it shows up as a surprise invoice.

Node.js: Streaming for Agent Loops

Streaming matters for coding agents because you often want to start parsing or canceling before the full response lands.

import OpenAI from "openai";

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export async function streamEdit(instruction, fileContext) {
  const stream = await client.chat.completions.create({
    model: "gpt-5.3-codex",
    stream: true,
    messages: [
      { role: "system", content: "Return a unified diff. Minimal changes only." },
      { role: "user", content: `${instruction}\n\nFile:\n${fileContext}` },
    ],
  });

  let out = "";
  for await (const chunk of stream) {
    const token = chunk.choices[0]?.delta?.content || "";
    out += token;
    process.stdout.write(token);
  }
  return out;
}

When to Use GPT-5.3-Codex vs GPT-5.5

Don't overthink this. A simple routing table covers most teams:

TaskPickWhy
Agent edit-test-fix loopgpt-5.3-codexCheap output, stays on-task across turns
Multi-file refactorgpt-5.3-codexFollows conventions, lower cost at volume
Architecture / design callgpt-5.5Stronger general reasoning
Code + domain knowledge mixgpt-5.5Broader world knowledge
Quick syntax fixgpt-5.4-miniGood enough, basically free

The teams getting the best cost-to-quality ratio aren't picking one model. They route. Plan with GPT-5.5, execute with GPT-5.3-Codex, fall back to a cheaper model for trivial edits. The trick is making that routing painless instead of a config nightmare.

That's where running everything through one OpenAI-compatible endpoint helps. With KissAPI you call GPT-5.3-Codex, GPT-5.5, Claude Opus 4.8, and Gemini through the same base URL and key, so swapping the model string is the entire migration. No second SDK, no second billing dashboard.

Three Cost Mistakes to Avoid

  1. Running the whole agent on GPT-5.5. The flagship for planning is fine. For the worker loop it's overkill and expensive.
  2. Ignoring cached input. Cached input on GPT-5.3-Codex is $0.175/1M, a 10x discount. Keep your system prompt and tool descriptions stable so they cache.
  3. No turn cap. Agents loop. Without a max-turn or max-token ceiling per task, a single confused run can burn dollars. Set a hard limit and log usage every turn.

Quick Sanity Check on Your Bill

Before you commit a model choice, do the napkin math. A typical agent turn might use 8K input (mostly cached) and 1.5K output. On GPT-5.3-Codex that's roughly (8000 × 0.175 + 1500 × 14) / 1,000,000 ≈ $0.022 per turn. On GPT-5.5 the same turn is closer to $0.049. Multiply by thousands of turns a week and the difference is your engineering coffee budget, or a new laptop.

Run Codex and Flagship Models Through One Endpoint

Create a free account at kissapi.ai/register and route GPT-5.3-Codex, GPT-5.5, Claude, and Gemini with one OpenAI-compatible key.

Start Free

FAQ

What is the GPT-5.3-Codex model ID for the API?

Use gpt-5.3-codex as the model name in the OpenAI-compatible Responses or Chat Completions API. It's the dedicated Codex model tuned for long-horizon agentic coding, separate from the general-purpose gpt-5.5.

How much does GPT-5.3-Codex cost compared to GPT-5.5?

GPT-5.3-Codex is $1.75 per 1M input tokens and $14 per 1M output, with cached input at $0.175. GPT-5.5 is $5 input and $30 output. For sustained coding loops, Codex is roughly 2-3x cheaper on output.

Should I use GPT-5.3-Codex or GPT-5.5 for coding?

Use GPT-5.3-Codex for agent loops, multi-file edits, and test-and-fix cycles that run many turns. Use GPT-5.5 for one-shot architecture reasoning or tasks that mix code with heavy general knowledge. Many teams route between both.