GPT-5-Codex API with Codex CLI: Setup Guide 2026

Published May 22, 2026 · 9 min read

Abstract terminal and API gateway workflow for GPT-5-Codex with Codex CLI

OpenAI has turned Codex from a nostalgic model name into a real developer workflow again. The important bit for builders is simple: GPT-5-Codex is available through the API and can be used from Codex CLI with an API key. That makes it useful outside the chat UI: local repo work, pull request review, CI triage, migration scripts, and long-running code edits.

This guide shows a practical setup. Not a hype tour. The goal is to get Codex CLI talking to GPT-5-Codex, keep secrets out of your repo, and avoid the two mistakes that make coding agents expensive: sending too much context and retrying failures blindly.

When GPT-5-Codex is the right model

Use GPT-5-Codex when the task is mostly code and the output will be judged by tests, diffs, or compiler errors. It is a poor use of money for simple text rewriting, JSON cleanup, or ticket classification. A cheaper general model is fine there.

Task	Good fit?	Why
PR review	Yes	It can reason over diffs, tests, and style rules.
Fix failing CI	Yes	The model can connect logs to likely code changes.
Large refactor	Sometimes	Good if you split the work into small checkpoints.
Summarize support tickets	No	Use a cheaper fast model.
Generate boilerplate	Maybe	Only worth it if the boilerplate has tricky constraints.

Basic API call

If you just want to test that your key and model work, start with the Responses API. Keep the prompt tiny. A lot of developers test a new coding model by throwing an entire repo at it, then wonder why the first request costs too much.

curl https://api.openai.com/v1/responses \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5-codex",
    "input": "Write a TypeScript function that validates an email and include 3 tests."
  }'

If you use an OpenAI-compatible gateway, the call shape stays the same. You change the base URL and key. That is where KissAPI can be handy: one API key, OpenAI-compatible syntax, and the option to route coding tasks to GPT-5-Codex while sending cheaper tasks elsewhere.

curl https://api.kissapi.ai/v1/responses \
  -H "Authorization: Bearer $KISSAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5-codex",
    "input": "Review this patch for security bugs: ..."
  }'

Set up Codex CLI

Install Codex CLI from the official package or repository, then keep configuration in environment variables. That works better than hard-coding keys in a config file because the same machine may run local experiments, CI jobs, and different client projects.

export OPENAI_API_KEY="sk-your-key"
export OPENAI_BASE_URL="https://api.openai.com/v1"
export CODEX_MODEL="gpt-5-codex"

For a gateway endpoint:

export OPENAI_API_KEY="$KISSAPI_API_KEY"
export OPENAI_BASE_URL="https://api.kissapi.ai/v1"
export CODEX_MODEL="gpt-5-codex"

The exact command flags can change between Codex CLI releases, so check codex --help on your installed version. The pattern is stable: provide an API key, a base URL if you are not using the default OpenAI endpoint, and choose the model explicitly instead of relying on a default.

A useful local workflow

Here is the loop I like for real repos:

Ask Codex to inspect only the relevant files.
Make it propose a plan before editing.
Apply one small patch.
Run tests locally.
Feed back only the failing test output, not the entire terminal history.

That last line matters. Coding agents get expensive when every turn includes old logs, old diffs, and files the model no longer needs. Treat context like a scarce resource.

git checkout -b codex/fix-login-timeout
codex "Inspect src/auth and tests/auth. Find why login timeout is flaky. Propose a minimal fix first."
# review the plan
codex "Apply the smallest safe patch. Do not rewrite unrelated files."
npm test -- tests/auth/login-timeout.test.ts

GitHub PR review example

For CI, do not send the whole repository. Send the pull request diff, the test output, and your review rules. A small script can collect that context and call the API.

DIFF=$(gh pr diff "$PR_NUMBER" --repo "$GITHUB_REPOSITORY")
LOGS=$(tail -n 200 test-output.log)

python scripts/review_pr.py \
  --model gpt-5-codex \
  --diff "$DIFF" \
  --logs "$LOGS"

A minimal Python reviewer using the Responses API:

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
    base_url=os.getenv("OPENAI_BASE_URL", "https://api.openai.com/v1")
)

prompt = f"""
You are reviewing a pull request.
Focus on correctness, security, missing tests, and risky migrations.
Return concise Markdown bullets. Do not nitpick formatting.

DIFF:
{os.environ.get('PR_DIFF', '')[:60000]}

TEST LOGS:
{os.environ.get('TEST_LOGS', '')[:12000]}
"""

response = client.responses.create(
    model=os.getenv("CODEX_MODEL", "gpt-5-codex"),
    input=prompt
)

print(response.output_text)

Cost controls that actually work

Most teams try to reduce coding-agent cost by yelling “be concise” in the system prompt. That helps a little. The real savings come from routing and context limits.

Cap diff size. If a PR is huge, review changed modules in batches.
Trim logs. Last 200-400 lines are usually enough. Full CI logs are mostly noise.
Route by task. Use GPT-5-Codex for code reasoning, cheaper models for summaries and labels.
Stop retry storms. Retry 429 and 5xx with backoff. Do not retry 401, 403, or invalid model errors.
Set a per-job budget. CI should fail closed with a useful message instead of burning tokens all night.

def should_retry(status_code: int) -> bool:
    return status_code in {408, 409, 429, 500, 502, 503, 504}

def backoff_seconds(attempt: int) -> float:
    return min(30, 2 ** attempt)

Fallback rules

Do not make every fallback automatic. If GPT-5-Codex is unavailable during a PR review, falling back to a weaker model may be fine. If it is doing a database migration patch, you probably want to stop and ask for human review.

Failure	Recommended action
429 rate limit	Back off, then retry once or twice.
5xx provider error	Retry, then use a backup coding model for review-only tasks.
401/403 auth error	Stop. Fix the key or account permissions.
Context length error	Split the diff or summarize files first.
Low-confidence code patch	Require human review before commit.

My take: GPT-5-Codex is best treated as a senior code reviewer and patch author, not a magical autonomous engineer. Give it narrow context, hard constraints, and tests. It performs better, and your bill stays sane.

Production checklist

Use environment variables or a secret manager for API keys.
Pin the model name in CI so upgrades do not change behavior silently.
Log token usage, status codes, retry counts, and latency.
Keep generated patches small enough for humans to review.
Block the agent from touching secrets, lockfiles, infra files, or migrations unless the task explicitly requires it.
Run tests before posting a final review or opening an automated PR.

Run GPT-5-Codex Through One API Gateway

Use KissAPI to access GPT-5-Codex and other coding models with OpenAI-compatible endpoints, simple keys, and flexible routing for dev tools and CI jobs.

Start Free →

Bottom line

GPT-5-Codex is worth testing if you already use Codex CLI or want API-backed code review in CI. Start with review tasks, not autonomous rewrites. Keep the context small. Add budget limits before the first scheduled workflow. Once the workflow is boring and predictable, then let it touch larger patches.