How do I track Codex CLI API cost?

Start with the new /usage command, export or record daily token totals, multiply input, cached input, and output tokens by your model prices, then set per-repository budgets and fallback routes for expensive coding-agent tasks.

Should I import everything from Claude Code into Codex CLI?

No. Import only reusable setup, repository rules, and recent chats that still matter. Old transcripts can add noise and increase token cost, so prune stale sessions before migrating.

OpenAI Codex CLI 0.140.0: Usage Tracking, Claude Code Import, and Safer Agent Workflows

Q: What changed in Codex CLI 0.140.0?

OpenAI's June 15, 2026 Codex CLI 0.140.0 release added /usage views for daily, weekly, and cumulative token activity, /import for selected Claude Code setup and chat history, permanent session deletion, better MCP reliability, and improved handling of large pasted blocks and attachments.

Published June 17, 2026 · 9 min read

OpenAI shipped Codex CLI 0.140.0 on June 15, 2026, and the release is more important than it looks. The headline changes are practical: /usage views for daily, weekly, and cumulative token activity; /import for pulling selected setup and recent chats from Claude Code; permanent session deletion; better MCP reliability; and stronger handling for large pasted blocks and image attachments.

That sounds like housekeeping. It isn't. It's a sign that coding agents are moving from fun demos into real engineering budgets. Once an agent can run long tasks, spawn subagents, call tools, import history, and chew through a repository, the painful question becomes: what did that just cost, and can we reproduce it safely?

This guide turns the June release into a developer playbook. We'll cover how to use the new usage data, how to migrate from Claude Code without dragging in stale context, and how to build a routing layer that keeps Codex useful without letting it burn tokens in the background.

News hook: OpenAI's Codex changelog lists Codex CLI 0.140.0 on June 15, 2026, with new /usage token activity views and /import support for Claude Code setup, project configuration, and recent chats.

Why `/usage` matters more than another model picker

Most AI coding bills don't explode because one request is expensive. They creep up because agents hide the loop: read files, plan, inspect tests, edit, run, retry, ask a subagent, summarize, and repeat. A single human prompt can become dozens of model calls.

The new /usage views give teams a first-party place to inspect token activity by day, week, and lifetime. Treat that as operational telemetry, not trivia. If you're running Codex in a serious repo, your first task is to build a tiny cost habit around it.

# Upgrade to the release mentioned in OpenAI's June 15, 2026 changelog
npm install -g @openai/[email protected]

# Inside Codex CLI
/usage

After every substantial session, record three things:

Which repository or service the session touched.
What class of work it performed: planning, implementation, review, test repair, migration.
Daily token movement before and after the session.

You don't need perfect accounting on day one. You need enough signal to catch the bad patterns: agents reading the same huge files repeatedly, retrying broken commands, or using a frontier model for simple grep-and-edit work.

A simple cost model for Codex sessions

OpenAI's pricing page currently lists gpt-5.3-codex at $1.75 per 1M input tokens, $0.175 per 1M cached input tokens, and $14 per 1M output tokens. Your exact model may differ, but the math pattern is the same.

Session type	Input tokens	Output tokens	Approx. cost on gpt-5.3-codex
Small bug fix	60k	8k	~$0.22
Medium feature	250k	35k	~$0.93
Long refactor loop	1.2M	180k	~$4.62
Runaway agent day	8M	1.1M	~$29.40

The lesson is blunt: output tokens dominate fast. Long explanations, repeated plans, verbose reviews, and failed loops cost more than most teams expect. Usage tracking is useful only if you also change behavior.

Turn `/usage` into budgets

Here's a lightweight budget script you can keep next to your engineering playbooks. It won't talk to Codex directly; you paste token totals from /usage or your provider logs. The point is to make cost review boring and repeatable.

#!/usr/bin/env python3
# codex_cost.py
PRICES = {
    "gpt-5.3-codex": {
        "input": 1.75,
        "cached_input": 0.175,
        "output": 14.00,
    }
}

def cost(model, input_tokens, output_tokens, cached_input_tokens=0):
    p = PRICES[model]
    fresh_input = max(input_tokens - cached_input_tokens, 0)
    return (
        fresh_input / 1_000_000 * p["input"] +
        cached_input_tokens / 1_000_000 * p["cached_input"] +
        output_tokens / 1_000_000 * p["output"]
    )

print(cost("gpt-5.3-codex", input_tokens=250_000, output_tokens=35_000))

Set budgets by task, not by person. For example:

PR review: $0.25 soft cap, $0.75 hard cap.
Single bug fix: $1 soft cap, $3 hard cap.
Large migration: explicit approval after the planning pass.

If your team uses an OpenAI-compatible gateway like KissAPI, put the same budget rules at the routing layer too. CLI-side visibility is helpful; gateway-side limits are what stop a bad loop at 2 a.m.

Using `/import` from Claude Code without importing the mess

Codex CLI 0.140.0 also adds /import for selectively importing setup, project configuration, and recent chats from Claude Code. That's useful if you have repository conventions, tool notes, or active task context that you don't want to rewrite.

But don't import everything. Old chat history has a half-life. Yesterday's debugging trail can become today's misleading instruction. Before you migrate, sort context into three buckets:

Import?	Content type	Reason
Yes	Repo commands, test commands, deployment notes	Reusable and easy to verify
Maybe	Recent design decisions	Useful if still active
No	Old failed attempts, stale logs, one-off speculation	Expensive noise

# In Codex CLI 0.140.0
/import

# Recommended migration rule:
# import setup and active project config first;
# import chats only when they explain current work.

My opinion: the best import is small. Give the agent durable facts and current constraints. Leave the archaeological record behind.

Build a routing policy for coding agents

Once you can see usage, the next step is routing. Not every coding-agent turn deserves the same model. A practical setup has three lanes:

Fast lane: cheap model for file search, summaries, and straightforward edits.
Default lane: Codex-grade model for implementation and multi-file reasoning.
Escalation lane: strongest model for architecture changes, security-sensitive edits, and broken test loops.

If you're using a gateway, the client can stay mostly unchanged while the backend chooses the lane.

curl https://api.kissapi.ai/v1/chat/completions \
  -H "Authorization: Bearer $KISSAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.3-codex",
    "messages": [
      {"role": "system", "content": "You are a concise coding agent. Avoid long explanations unless asked."},
      {"role": "user", "content": "Patch the failing auth test and explain the minimal diff."}
    ]
  }'

The important bit isn't the exact endpoint. It's the discipline: one interface, multiple routes, visible spend, and fallbacks when a model is at capacity.

Node.js: add a per-task token guardrail

For internal tools that call an OpenAI-compatible API, wrap calls with a per-task budget. This example estimates cost after each response and stops the loop before it becomes silly.

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.KISSAPI_KEY,
  baseURL: "https://api.kissapi.ai/v1"
});

const PRICE = { input: 1.75, output: 14.00 }; // per 1M tokens
let spent = 0;
const maxDollars = 2.00;

function estimate(usage) {
  return (usage.prompt_tokens / 1_000_000 * PRICE.input) +
         (usage.completion_tokens / 1_000_000 * PRICE.output);
}

export async function guardedAgentTurn(messages) {
  if (spent >= maxDollars) throw new Error("Agent budget reached");

  const res = await client.chat.completions.create({
    model: "gpt-5.3-codex",
    messages,
    temperature: 0.2
  });

  if (res.usage) spent += estimate(res.usage);
  return res.choices[0].message.content;
}

That small guardrail changes team behavior. People stop treating agents like free background workers and start giving them sharper tasks.

Operational checklist after upgrading

Upgrade to Codex CLI 0.140.0 or later.
Run /usage before and after one real coding session.
Write down soft and hard budgets for PR review, bug fix, and refactor tasks.
Use /import carefully: setup first, recent chats only when relevant.
Delete stale sessions you no longer need.
Move fallback and budget enforcement into your API gateway if you run agents in production.

The June 15 release won't magically make coding agents cheaper. It gives you the instrumentation to stop guessing. That's the useful part. Measure sessions, prune context, route by task, and keep a hard stop somewhere outside the agent itself.

Need one API route for Codex, GPT, and Claude workflows?

KissAPI gives developers an OpenAI-compatible endpoint for multi-model routing, fallback, and cost controls. Create a free account at kissapi.ai/register and test your coding-agent stack today.

Start Free

FAQ

What changed in Codex CLI 0.140.0?

OpenAI added /usage token activity views, /import from Claude Code, permanent session deletion, better MCP reliability, and improvements for oversized pasted text and attachments.

How should teams use Codex usage data?

Track token movement by repository and task type, then set soft and hard budgets. The goal isn't perfect accounting. It's catching runaway loops and expensive task patterns early.

Is Claude Code import worth using?

Yes, but selectively. Import durable setup, repo rules, and current project context. Avoid stale debugging transcripts because they add cost and can steer the agent in the wrong direction.