Codex CLI GitHub Actions Setup Guide 2026: Run AI Code Reviews in CI
Running Codex CLI on your laptop is useful. Running it inside GitHub Actions is where it starts to feel like infrastructure. Every pull request can get a short review, every failed test run can get a first-pass diagnosis, and every release branch can get a risk summary before a human merges it.
The trick is not “let the agent edit production code whenever it wants.” That’s a bad idea. The better pattern is narrower: use Codex CLI as a read-heavy CI reviewer, keep secrets locked in GitHub Actions, cap token spend, and require humans to apply patches.
What Codex CLI Should Do in CI
A CI coding agent should have boring jobs. Boring is good here. You want repeatable comments, not a mysterious robot pushing commits at 3 a.m.
| Use case | Good CI behavior | Avoid |
|---|---|---|
| Pull request review | Summarize risky diffs and missing tests | Rewriting the branch automatically |
| Test failure triage | Explain likely cause from logs | Guessing without logs |
| Security review | Flag obvious auth, injection, and secret leaks | Replacing real security tooling |
| Release notes | Draft notes from merged commits | Inventing customer impact |
My default recommendation: start with pull request review only. Once the output is consistently useful, add test failure triage. Save automated patch generation for internal repos where rollbacks are cheap.
Prerequisites
- A repository using GitHub Actions.
- A Codex CLI install command that works in a clean Linux runner.
- An API key stored as a GitHub Actions secret.
- A model choice for review tasks. Don’t use your most expensive reasoning model for every diff.
If your provider supports an OpenAI-compatible endpoint, this setup is easier. For example, KissAPI lets you point OpenAI-style SDKs and tools at one base URL, then route requests across models without changing CI YAML every time pricing or availability changes.
Step 1: Add GitHub Secrets
In GitHub, open Settings → Secrets and variables → Actions → New repository secret. Add:
AI_API_KEY: your API keyAI_BASE_URL: optional, for gateways such ashttps://api.kissapi.ai/v1AI_REVIEW_MODEL: optional model name, such asgpt-5.5-miniorclaude-sonnet-4-6
Never put the key directly in YAML. Also avoid running this workflow on untrusted fork pull requests with full secret access. For public repos, use pull_request without secrets, or use pull_request_target only if you really understand the security model.
Step 2: Smoke-Test the API Before Running the Agent
A quick API check saves noisy CI failures. Add this near the start of the workflow:
curl -sS "$AI_BASE_URL/chat/completions" \
-H "Authorization: Bearer $AI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "'"${AI_REVIEW_MODEL:-gpt-5.5-mini}"'",
"messages": [{"role":"user","content":"Reply with ok."}],
"max_tokens": 10
}'
If this fails with 401, your key is wrong. If it fails with 404, your base URL probably has the wrong path. If it fails with 429, don’t keep retrying blindly. Add backoff or switch to a cheaper fallback model.
Step 3: Create the GitHub Actions Workflow
Create .github/workflows/codex-review.yml:
name: Codex PR Review
on:
pull_request:
types: [opened, synchronize, reopened]
permissions:
contents: read
pull-requests: write
jobs:
review:
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Build diff
run: |
git fetch origin ${{ github.base_ref }} --depth=1
git diff --unified=80 origin/${{ github.base_ref }}...HEAD > /tmp/pr.diff
wc -c /tmp/pr.diff
- name: Trim huge diffs
run: |
head -c 120000 /tmp/pr.diff > /tmp/pr.trimmed.diff
- name: Install Codex CLI
run: npm install -g @openai/codex
- name: Run Codex review
env:
OPENAI_API_KEY: ${{ secrets.AI_API_KEY }}
OPENAI_BASE_URL: ${{ secrets.AI_BASE_URL }}
REVIEW_MODEL: ${{ secrets.AI_REVIEW_MODEL }}
run: |
cat > /tmp/review_prompt.txt <<'PROMPT'
You are reviewing a GitHub pull request.
Be concise. Focus on bugs, broken tests, security issues,
risky migrations, and missing edge cases.
Do not praise. Do not rewrite the whole patch.
Return Markdown with:
1. Summary
2. High-risk issues
3. Suggested tests
4. Small fixes
PROMPT
codex exec \
--model "${REVIEW_MODEL:-gpt-5.5-mini}" \
--prompt "$(cat /tmp/review_prompt.txt)\n\nDIFF:\n$(cat /tmp/pr.trimmed.diff)" \
> /tmp/codex-review.md
- name: Comment on PR
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const body = fs.readFileSync('/tmp/codex-review.md', 'utf8');
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
body: `## Codex CI Review\n\n${body}`
});
That’s the basic loop: checkout, build a diff, trim it, ask Codex for a review, then post the result as a PR comment.
Step 4: Add a Safer Review Prompt
The prompt matters more than people admit. A vague “review this PR” prompt tends to produce generic advice. A CI prompt should be blunt and bounded:
You are a CI code reviewer. Only comment on issues that are visible in the diff.
Prioritize correctness, security, data loss, backwards compatibility, and tests.
If the diff is too small to judge, say what context is missing.
Do not request style changes unless they affect behavior.
Do not claim you ran tests.
That last line is important. Agents love sounding more certain than they are. In CI, false confidence wastes reviewer time.
Step 5: Use Python for Test Failure Triage
Once PR review works, add a second job that only runs when tests fail. This Python script sends the last part of the test log to an OpenAI-compatible API:
from openai import OpenAI
import os
from pathlib import Path
client = OpenAI(
api_key=os.environ["AI_API_KEY"],
base_url=os.environ.get("AI_BASE_URL", "https://api.openai.com/v1"),
)
log = Path("/tmp/test.log").read_text(errors="ignore")[-50000:]
resp = client.chat.completions.create(
model=os.environ.get("AI_REVIEW_MODEL", "gpt-5.5-mini"),
messages=[{
"role": "user",
"content": "Explain this CI failure. Give likely cause and next debugging step.\n\n" + log
}],
temperature=0.2,
)
print(resp.choices[0].message.content)
This is often more useful than a full agent run. Logs already contain the evidence. The model just needs to compress it into something a developer can act on.
Step 6: Use Node.js for PR Metadata
You can also review PR title, labels, changed files, and commit messages before sending the diff:
import OpenAI from "openai";
import fs from "node:fs";
const client = new OpenAI({
apiKey: process.env.AI_API_KEY,
baseURL: process.env.AI_BASE_URL || "https://api.openai.com/v1",
});
const diff = fs.readFileSync("/tmp/pr.trimmed.diff", "utf8");
const response = await client.chat.completions.create({
model: process.env.AI_REVIEW_MODEL || "gpt-5.5-mini",
messages: [{
role: "user",
content: `Review this PR diff for production risk. Be specific.\n\n${diff}`,
}],
max_tokens: 900,
});
console.log(response.choices[0].message.content);
The Node route is handy if your repo already has GitHub API utilities.
Cost Controls That Actually Work
CI agents can quietly burn money because every push triggers a new run. Put limits in place before the first invoice surprises you.
- Trim diffs. Most useful review comments come from the first 50-120 KB of diff context.
- Skip generated files. Exclude lockfiles, snapshots, minified bundles, and generated clients unless that’s the point of the PR.
- Use a cheaper review model first. Escalate only for high-risk folders like auth, billing, migrations, or permissions.
- Cancel old runs. Use GitHub Actions concurrency so a new push cancels the previous review.
- Route by task. A gateway such as KissAPI can keep the same API format while you move PR review, log triage, and deep reasoning to different models.
concurrency:
group: codex-review-${{ github.event.pull_request.number }}
cancel-in-progress: true
Common Failure Modes
The workflow works locally but fails in Actions
Check environment variable names first. Many CLIs expect OPENAI_API_KEY, while your secret may be named AI_API_KEY. Map it explicitly in the workflow.
The PR comment is too long
Ask for a capped review: “maximum 12 bullets” or “only high-risk issues.” You can also truncate the output before posting, but it’s better to make the model concise.
The model comments on unchanged code
Tell it to review only visible diff lines. If you send full files, separate them clearly from the patch and ask for evidence.
Forked PRs cannot access secrets
That’s GitHub protecting you. Don’t bypass it casually. For open source repos, run a no-secret lint workflow on forks and reserve AI review for trusted branches or maintainer-triggered workflows.
Run Coding Agents Through One API
Use KissAPI to route Codex CLI, Claude Code, Cursor, and backend agents through one OpenAI-compatible endpoint with flexible model choices.
Start Free →Final Recommendation
Start small: one PR comment, read-only permissions, trimmed diffs, cheap model, ten-minute timeout. If the comments save reviewers time for a week, expand into test triage. If the comments are noisy, tighten the prompt before changing models.
Codex CLI in GitHub Actions should feel like a junior reviewer who never gets tired, not an unsupervised maintainer. Keep humans in the merge path and it becomes a useful CI layer instead of another source of chaos.