Codex CLI GitHub Actions Setup Guide 2026: Run AI Code Reviews in CI

Codex CLI GitHub Actions API workflow

Running Codex CLI on your laptop is useful. Running it inside GitHub Actions is where it starts to feel like infrastructure. Every pull request can get a short review, every failed test run can get a first-pass diagnosis, and every release branch can get a risk summary before a human merges it.

The trick is not “let the agent edit production code whenever it wants.” That’s a bad idea. The better pattern is narrower: use Codex CLI as a read-heavy CI reviewer, keep secrets locked in GitHub Actions, cap token spend, and require humans to apply patches.

What Codex CLI Should Do in CI

A CI coding agent should have boring jobs. Boring is good here. You want repeatable comments, not a mysterious robot pushing commits at 3 a.m.

Use caseGood CI behaviorAvoid
Pull request reviewSummarize risky diffs and missing testsRewriting the branch automatically
Test failure triageExplain likely cause from logsGuessing without logs
Security reviewFlag obvious auth, injection, and secret leaksReplacing real security tooling
Release notesDraft notes from merged commitsInventing customer impact

My default recommendation: start with pull request review only. Once the output is consistently useful, add test failure triage. Save automated patch generation for internal repos where rollbacks are cheap.

Prerequisites

If your provider supports an OpenAI-compatible endpoint, this setup is easier. For example, KissAPI lets you point OpenAI-style SDKs and tools at one base URL, then route requests across models without changing CI YAML every time pricing or availability changes.

Step 1: Add GitHub Secrets

In GitHub, open Settings → Secrets and variables → Actions → New repository secret. Add:

Never put the key directly in YAML. Also avoid running this workflow on untrusted fork pull requests with full secret access. For public repos, use pull_request without secrets, or use pull_request_target only if you really understand the security model.

Step 2: Smoke-Test the API Before Running the Agent

A quick API check saves noisy CI failures. Add this near the start of the workflow:

curl -sS "$AI_BASE_URL/chat/completions" \
  -H "Authorization: Bearer $AI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "'"${AI_REVIEW_MODEL:-gpt-5.5-mini}"'",
    "messages": [{"role":"user","content":"Reply with ok."}],
    "max_tokens": 10
  }'

If this fails with 401, your key is wrong. If it fails with 404, your base URL probably has the wrong path. If it fails with 429, don’t keep retrying blindly. Add backoff or switch to a cheaper fallback model.

Step 3: Create the GitHub Actions Workflow

Create .github/workflows/codex-review.yml:

name: Codex PR Review

on:
  pull_request:
    types: [opened, synchronize, reopened]

permissions:
  contents: read
  pull-requests: write

jobs:
  review:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Build diff
        run: |
          git fetch origin ${{ github.base_ref }} --depth=1
          git diff --unified=80 origin/${{ github.base_ref }}...HEAD > /tmp/pr.diff
          wc -c /tmp/pr.diff

      - name: Trim huge diffs
        run: |
          head -c 120000 /tmp/pr.diff > /tmp/pr.trimmed.diff

      - name: Install Codex CLI
        run: npm install -g @openai/codex

      - name: Run Codex review
        env:
          OPENAI_API_KEY: ${{ secrets.AI_API_KEY }}
          OPENAI_BASE_URL: ${{ secrets.AI_BASE_URL }}
          REVIEW_MODEL: ${{ secrets.AI_REVIEW_MODEL }}
        run: |
          cat > /tmp/review_prompt.txt <<'PROMPT'
          You are reviewing a GitHub pull request.
          Be concise. Focus on bugs, broken tests, security issues,
          risky migrations, and missing edge cases.
          Do not praise. Do not rewrite the whole patch.
          Return Markdown with:
          1. Summary
          2. High-risk issues
          3. Suggested tests
          4. Small fixes
          PROMPT

          codex exec \
            --model "${REVIEW_MODEL:-gpt-5.5-mini}" \
            --prompt "$(cat /tmp/review_prompt.txt)\n\nDIFF:\n$(cat /tmp/pr.trimmed.diff)" \
            > /tmp/codex-review.md

      - name: Comment on PR
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const body = fs.readFileSync('/tmp/codex-review.md', 'utf8');
            await github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              body: `## Codex CI Review\n\n${body}`
            });

That’s the basic loop: checkout, build a diff, trim it, ask Codex for a review, then post the result as a PR comment.

Step 4: Add a Safer Review Prompt

The prompt matters more than people admit. A vague “review this PR” prompt tends to produce generic advice. A CI prompt should be blunt and bounded:

You are a CI code reviewer. Only comment on issues that are visible in the diff.
Prioritize correctness, security, data loss, backwards compatibility, and tests.
If the diff is too small to judge, say what context is missing.
Do not request style changes unless they affect behavior.
Do not claim you ran tests.

That last line is important. Agents love sounding more certain than they are. In CI, false confidence wastes reviewer time.

Step 5: Use Python for Test Failure Triage

Once PR review works, add a second job that only runs when tests fail. This Python script sends the last part of the test log to an OpenAI-compatible API:

from openai import OpenAI
import os
from pathlib import Path

client = OpenAI(
    api_key=os.environ["AI_API_KEY"],
    base_url=os.environ.get("AI_BASE_URL", "https://api.openai.com/v1"),
)

log = Path("/tmp/test.log").read_text(errors="ignore")[-50000:]

resp = client.chat.completions.create(
    model=os.environ.get("AI_REVIEW_MODEL", "gpt-5.5-mini"),
    messages=[{
        "role": "user",
        "content": "Explain this CI failure. Give likely cause and next debugging step.\n\n" + log
    }],
    temperature=0.2,
)

print(resp.choices[0].message.content)

This is often more useful than a full agent run. Logs already contain the evidence. The model just needs to compress it into something a developer can act on.

Step 6: Use Node.js for PR Metadata

You can also review PR title, labels, changed files, and commit messages before sending the diff:

import OpenAI from "openai";
import fs from "node:fs";

const client = new OpenAI({
  apiKey: process.env.AI_API_KEY,
  baseURL: process.env.AI_BASE_URL || "https://api.openai.com/v1",
});

const diff = fs.readFileSync("/tmp/pr.trimmed.diff", "utf8");

const response = await client.chat.completions.create({
  model: process.env.AI_REVIEW_MODEL || "gpt-5.5-mini",
  messages: [{
    role: "user",
    content: `Review this PR diff for production risk. Be specific.\n\n${diff}`,
  }],
  max_tokens: 900,
});

console.log(response.choices[0].message.content);

The Node route is handy if your repo already has GitHub API utilities.

Cost Controls That Actually Work

CI agents can quietly burn money because every push triggers a new run. Put limits in place before the first invoice surprises you.

concurrency:
  group: codex-review-${{ github.event.pull_request.number }}
  cancel-in-progress: true

Common Failure Modes

The workflow works locally but fails in Actions

Check environment variable names first. Many CLIs expect OPENAI_API_KEY, while your secret may be named AI_API_KEY. Map it explicitly in the workflow.

The PR comment is too long

Ask for a capped review: “maximum 12 bullets” or “only high-risk issues.” You can also truncate the output before posting, but it’s better to make the model concise.

The model comments on unchanged code

Tell it to review only visible diff lines. If you send full files, separate them clearly from the patch and ask for evidence.

Forked PRs cannot access secrets

That’s GitHub protecting you. Don’t bypass it casually. For open source repos, run a no-secret lint workflow on forks and reserve AI review for trusted branches or maintainer-triggered workflows.

Run Coding Agents Through One API

Use KissAPI to route Codex CLI, Claude Code, Cursor, and backend agents through one OpenAI-compatible endpoint with flexible model choices.

Start Free →

Final Recommendation

Start small: one PR comment, read-only permissions, trimmed diffs, cheap model, ten-minute timeout. If the comments save reviewers time for a week, expand into test triage. If the comments are noisy, tighten the prompt before changing models.

Codex CLI in GitHub Actions should feel like a junior reviewer who never gets tired, not an unsupervised maintainer. Keep humans in the merge path and it becomes a useful CI layer instead of another source of chaos.