Codex CLI GitHub Actions Setup Guide 2026: Run AI Code Reviews in CI

Published May 20, 2026 · 9 min read

Running Codex CLI on your laptop is useful. Running it inside GitHub Actions is where it starts to feel like infrastructure. Every pull request can get a short review, every failed test run can get a first-pass diagnosis, and every release branch can get a risk summary before a human merges it.

The trick is not “let the agent edit production code whenever it wants.” That’s a bad idea. The better pattern is narrower: use Codex CLI as a read-heavy CI reviewer, keep secrets locked in GitHub Actions, cap token spend, and require humans to apply patches.

What Codex CLI Should Do in CI

A CI coding agent should have boring jobs. Boring is good here. You want repeatable comments, not a mysterious robot pushing commits at 3 a.m.

Use case	Good CI behavior	Avoid
Pull request review	Summarize risky diffs and missing tests	Rewriting the branch automatically
Test failure triage	Explain likely cause from logs	Guessing without logs
Security review	Flag obvious auth, injection, and secret leaks	Replacing real security tooling
Release notes	Draft notes from merged commits	Inventing customer impact

My default recommendation: start with pull request review only. Once the output is consistently useful, add test failure triage. Save automated patch generation for internal repos where rollbacks are cheap.

Prerequisites

A repository using GitHub Actions.
A Codex CLI install command that works in a clean Linux runner.
An API key stored as a GitHub Actions secret.
A model choice for review tasks. Don’t use your most expensive reasoning model for every diff.

If your provider supports an OpenAI-compatible endpoint, this setup is easier. For example, KissAPI lets you point OpenAI-style SDKs and tools at one base URL, then route requests across models without changing CI YAML every time pricing or availability changes.

Step 1: Add GitHub Secrets

In GitHub, open Settings → Secrets and variables → Actions → New repository secret. Add:

AI_API_KEY: your API key
AI_BASE_URL: optional, for gateways such as https://api.kissapi.ai/v1
AI_REVIEW_MODEL: optional model name, such as gpt-5.5-mini or claude-sonnet-4-6

Never put the key directly in YAML. Also avoid running this workflow on untrusted fork pull requests with full secret access. For public repos, use pull_request without secrets, or use pull_request_target only if you really understand the security model.

Step 2: Smoke-Test the API Before Running the Agent

A quick API check saves noisy CI failures. Add this near the start of the workflow:

curl -sS "$AI_BASE_URL/chat/completions" \
  -H "Authorization: Bearer $AI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "'"${AI_REVIEW_MODEL:-gpt-5.5-mini}"'",
    "messages": [{"role":"user","content":"Reply with ok."}],
    "max_tokens": 10
  }'

If this fails with 401, your key is wrong. If it fails with 404, your base URL probably has the wrong path. If it fails with 429, don’t keep retrying blindly. Add backoff or switch to a cheaper fallback model.

Step 3: Create the GitHub Actions Workflow

Create .github/workflows/codex-review.yml:

name: Codex PR Review

on:
  pull_request:
    types: [opened, synchronize, reopened]

permissions:
  contents: read
  pull-requests: write

jobs:
  review:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Build diff
        run: |
          git fetch origin ${{ github.base_ref }} --depth=1
          git diff --unified=80 origin/${{ github.base_ref }}...HEAD > /tmp/pr.diff
          wc -c /tmp/pr.diff

      - name: Trim huge diffs
        run: |
          head -c 120000 /tmp/pr.diff > /tmp/pr.trimmed.diff

      - name: Install Codex CLI
        run: npm install -g @openai/codex

      - name: Run Codex review
        env:
          OPENAI_API_KEY: ${{ secrets.AI_API_KEY }}
          OPENAI_BASE_URL: ${{ secrets.AI_BASE_URL }}
          REVIEW_MODEL: ${{ secrets.AI_REVIEW_MODEL }}
        run: |
          cat > /tmp/review_prompt.txt <<'PROMPT'
          You are reviewing a GitHub pull request.
          Be concise. Focus on bugs, broken tests, security issues,
          risky migrations, and missing edge cases.
          Do not praise. Do not rewrite the whole patch.
          Return Markdown with:
          1. Summary
          2. High-risk issues
          3. Suggested tests
          4. Small fixes
          PROMPT

          codex exec \
            --model "${REVIEW_MODEL:-gpt-5.5-mini}" \
            --prompt "$(cat /tmp/review_prompt.txt)\n\nDIFF:\n$(cat /tmp/pr.trimmed.diff)" \
            > /tmp/codex-review.md

      - name: Comment on PR
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const body = fs.readFileSync('/tmp/codex-review.md', 'utf8');
            await github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              body: `## Codex CI Review\n\n${body}`
            });

That’s the basic loop: checkout, build a diff, trim it, ask Codex for a review, then post the result as a PR comment.

Step 4: Add a Safer Review Prompt

The prompt matters more than people admit. A vague “review this PR” prompt tends to produce generic advice. A CI prompt should be blunt and bounded:

You are a CI code reviewer. Only comment on issues that are visible in the diff.
Prioritize correctness, security, data loss, backwards compatibility, and tests.
If the diff is too small to judge, say what context is missing.
Do not request style changes unless they affect behavior.
Do not claim you ran tests.

That last line is important. Agents love sounding more certain than they are. In CI, false confidence wastes reviewer time.

Step 5: Use Python for Test Failure Triage

Once PR review works, add a second job that only runs when tests fail. This Python script sends the last part of the test log to an OpenAI-compatible API:

from openai import OpenAI
import os
from pathlib import Path

client = OpenAI(
    api_key=os.environ["AI_API_KEY"],
    base_url=os.environ.get("AI_BASE_URL", "https://api.openai.com/v1"),
)

log = Path("/tmp/test.log").read_text(errors="ignore")[-50000:]

resp = client.chat.completions.create(
    model=os.environ.get("AI_REVIEW_MODEL", "gpt-5.5-mini"),
    messages=[{
        "role": "user",
        "content": "Explain this CI failure. Give likely cause and next debugging step.\n\n" + log
    }],
    temperature=0.2,
)

print(resp.choices[0].message.content)

This is often more useful than a full agent run. Logs already contain the evidence. The model just needs to compress it into something a developer can act on.

Step 6: Use Node.js for PR Metadata

You can also review PR title, labels, changed files, and commit messages before sending the diff:

import OpenAI from "openai";
import fs from "node:fs";

const client = new OpenAI({
  apiKey: process.env.AI_API_KEY,
  baseURL: process.env.AI_BASE_URL || "https://api.openai.com/v1",
});

const diff = fs.readFileSync("/tmp/pr.trimmed.diff", "utf8");

const response = await client.chat.completions.create({
  model: process.env.AI_REVIEW_MODEL || "gpt-5.5-mini",
  messages: [{
    role: "user",
    content: `Review this PR diff for production risk. Be specific.\n\n${diff}`,
  }],
  max_tokens: 900,
});

console.log(response.choices[0].message.content);

The Node route is handy if your repo already has GitHub API utilities.

Cost Controls That Actually Work

CI agents can quietly burn money because every push triggers a new run. Put limits in place before the first invoice surprises you.

Trim diffs. Most useful review comments come from the first 50-120 KB of diff context.
Skip generated files. Exclude lockfiles, snapshots, minified bundles, and generated clients unless that’s the point of the PR.
Use a cheaper review model first. Escalate only for high-risk folders like auth, billing, migrations, or permissions.
Cancel old runs. Use GitHub Actions concurrency so a new push cancels the previous review.
Route by task. A gateway such as KissAPI can keep the same API format while you move PR review, log triage, and deep reasoning to different models.

concurrency:
  group: codex-review-${{ github.event.pull_request.number }}
  cancel-in-progress: true

Common Failure Modes

The workflow works locally but fails in Actions

Check environment variable names first. Many CLIs expect OPENAI_API_KEY, while your secret may be named AI_API_KEY. Map it explicitly in the workflow.

The PR comment is too long

Ask for a capped review: “maximum 12 bullets” or “only high-risk issues.” You can also truncate the output before posting, but it’s better to make the model concise.

The model comments on unchanged code

Tell it to review only visible diff lines. If you send full files, separate them clearly from the patch and ask for evidence.

Forked PRs cannot access secrets

That’s GitHub protecting you. Don’t bypass it casually. For open source repos, run a no-secret lint workflow on forks and reserve AI review for trusted branches or maintainer-triggered workflows.

Run Coding Agents Through One API

Use KissAPI to route Codex CLI, Claude Code, Cursor, and backend agents through one OpenAI-compatible endpoint with flexible model choices.

Start Free →

Final Recommendation

Start small: one PR comment, read-only permissions, trimmed diffs, cheap model, ten-minute timeout. If the comments save reviewers time for a week, expand into test triage. If the comments are noisy, tighten the prompt before changing models.

Codex CLI in GitHub Actions should feel like a junior reviewer who never gets tired, not an unsupervised maintainer. Keep humans in the merge path and it becomes a useful CI layer instead of another source of chaos.