Build an AI Code Review Bot with GitHub Actions in 2026
Code review is one of those things everyone agrees is important and nobody has enough time for. Your team opens 15 PRs a day, reviewers are backlogged, and half the comments end up being about formatting anyway.
What if a bot handled the first pass? Not replacing human reviewers โ catching the obvious stuff so humans can focus on architecture, logic, and design decisions that actually need a brain.
This tutorial walks through building an AI code review bot that runs on every pull request via GitHub Actions. It reads the diff, sends it to an AI model (Claude, GPT-5, or whatever you prefer), and posts inline comments directly on the PR. The whole thing takes about 30 minutes to set up and costs pennies per review.
What the Bot Actually Does
Here's the flow:
- A developer opens or updates a pull request
- GitHub Actions triggers the workflow
- A Python script fetches the PR diff via the GitHub API
- The diff gets sent to an AI model with a code review prompt
- The model returns structured feedback (file, line, comment)
- The script posts each comment as an inline review on the PR
The bot catches things like: unused variables, potential null pointer issues, missing error handling, security red flags (hardcoded secrets, SQL injection patterns), and style inconsistencies. It won't catch everything a senior engineer would, but it catches enough to be worth running.
Prerequisites
- A GitHub repository (public or private)
- An API key for Claude, GPT-5, or any OpenAI-compatible endpoint
- Basic familiarity with GitHub Actions and Python
Step 1: The Review Script
Create a file at .github/scripts/ai_review.py in your repo. This is the core logic โ it fetches the diff, calls the AI, and posts comments.
import os
import json
import requests
from openai import OpenAI
# Config from environment
API_KEY = os.environ["AI_API_KEY"]
API_BASE = os.environ.get("AI_API_BASE", "https://api.openai.com/v1")
MODEL = os.environ.get("AI_MODEL", "claude-sonnet-4-6")
GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]
REPO = os.environ["GITHUB_REPOSITORY"]
PR_NUMBER = os.environ["PR_NUMBER"]
client = OpenAI(api_key=API_KEY, base_url=API_BASE)
def get_pr_diff():
"""Fetch the PR diff from GitHub API."""
url = f"https://api.github.com/repos/{REPO}/pulls/{PR_NUMBER}"
headers = {
"Authorization": f"token {GITHUB_TOKEN}",
"Accept": "application/vnd.github.v3.diff"
}
resp = requests.get(url, headers=headers)
resp.raise_for_status()
return resp.text
def get_pr_files():
"""Get list of changed files with patch info."""
url = f"https://api.github.com/repos/{REPO}/pulls/{PR_NUMBER}/files"
headers = {"Authorization": f"token {GITHUB_TOKEN}"}
resp = requests.get(url, headers=headers)
resp.raise_for_status()
return resp.json()
def review_diff(diff_text):
"""Send diff to AI model for review."""
prompt = """You are a senior code reviewer. Review this pull request diff and identify:
1. Bugs or potential runtime errors
2. Security issues (hardcoded secrets, injection, etc.)
3. Missing error handling
4. Performance concerns
5. Code style issues that affect readability
For each issue, respond with a JSON array of objects:
{
"file": "path/to/file.py",
"line": 42,
"severity": "error" | "warning" | "suggestion",
"comment": "Your review comment here"
}
Only flag real issues. Don't nitpick formatting if it's consistent.
If the code looks good, return an empty array: []
PR Diff:
"""
response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": "You are a code reviewer. Respond only with valid JSON."},
{"role": "user", "content": prompt + diff_text}
],
temperature=0.1,
max_tokens=4000
)
content = response.choices[0].message.content.strip()
# Handle markdown code blocks in response
if content.startswith("```"):
content = content.split("\n", 1)[1].rsplit("```", 1)[0]
return json.loads(content)
def post_review_comments(comments):
"""Post inline comments on the PR."""
if not comments:
# Post a simple approval comment
url = f"https://api.github.com/repos/{REPO}/pulls/{PR_NUMBER}/reviews"
headers = {"Authorization": f"token {GITHUB_TOKEN}"}
data = {
"body": "๐ค AI Review: Code looks good. No issues found.",
"event": "COMMENT"
}
requests.post(url, headers=headers, json=data)
return
# Build review with inline comments
pr_files = get_pr_files()
valid_comments = []
for c in comments:
# Verify the file exists in the PR
matching = [f for f in pr_files if f["filename"] == c["file"]]
if not matching:
continue
severity_emoji = {
"error": "๐ด",
"warning": "๐ก",
"suggestion": "๐ก"
}.get(c.get("severity", "suggestion"), "๐ก")
valid_comments.append({
"path": c["file"],
"line": c["line"],
"body": f"{severity_emoji} **{c.get('severity', 'suggestion').title()}**: {c['comment']}"
})
if not valid_comments:
return
url = f"https://api.github.com/repos/{REPO}/pulls/{PR_NUMBER}/reviews"
headers = {"Authorization": f"token {GITHUB_TOKEN}"}
data = {
"body": f"๐ค AI Review: Found {len(valid_comments)} issue(s) to look at.",
"event": "COMMENT",
"comments": valid_comments
}
resp = requests.post(url, headers=headers, json=data)
if resp.status_code != 200:
print(f"Failed to post review: {resp.status_code} {resp.text}")
else:
print(f"Posted review with {len(valid_comments)} comments")
def main():
print(f"Reviewing PR #{PR_NUMBER} in {REPO}")
diff = get_pr_diff()
# Truncate very large diffs to stay within token limits
max_chars = 30000
if len(diff) > max_chars:
diff = diff[:max_chars] + "\n... (diff truncated)"
print(f"Diff truncated from {len(diff)} to {max_chars} chars")
comments = review_diff(diff)
print(f"AI found {len(comments)} issues")
post_review_comments(comments)
if __name__ == "__main__":
main()
A few things worth noting in this script:
- The
temperature=0.1keeps the model focused and consistent. You don't want creative code reviews. - The diff gets truncated at 30K characters. Most PRs are well under this, but monster PRs with generated files can blow past token limits.
- The script validates that each commented file actually exists in the PR diff. GitHub's API rejects comments on files that aren't part of the PR.
Step 2: The GitHub Actions Workflow
Create .github/workflows/ai-review.yml:
name: AI Code Review
on:
pull_request:
types: [opened, synchronize]
permissions:
contents: read
pull-requests: write
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install dependencies
run: pip install openai requests
- name: Run AI Review
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
AI_API_KEY: ${{ secrets.AI_API_KEY }}
AI_API_BASE: ${{ secrets.AI_API_BASE }}
AI_MODEL: ${{ vars.AI_MODEL || 'claude-sonnet-4-6' }}
PR_NUMBER: ${{ github.event.pull_request.number }}
run: python .github/scripts/ai_review.py
Two secrets to configure in your repo settings (Settings โ Secrets and variables โ Actions):
AI_API_KEYโ your API keyAI_API_BASEโ the API endpoint URL (e.g.,https://api.kissapi.ai/v1)
The AI_MODEL is set as a variable (not a secret) so you can change it without re-entering secrets. Defaults to Claude Sonnet 4.6, which hits the sweet spot of quality and cost for code review.
Step 3: Choosing the Right Model
Not all models are equal at code review. Here's what I've found after running this on several repos:
| Model | Review Quality | Cost per Review* | Speed |
|---|---|---|---|
| Claude Opus 4.6 | Excellent | ~$0.08 | 8-15s |
| Claude Sonnet 4.6 | Very Good | ~$0.02 | 3-6s |
| GPT-5 | Very Good | ~$0.05 | 4-8s |
| DeepSeek V3 | Good | ~$0.005 | 5-10s |
*Estimated cost for a typical 500-line PR diff.
Claude Sonnet 4.6 is my default recommendation. It catches most of what Opus catches at a quarter of the cost. For open-source projects with tight budgets, DeepSeek V3 is surprisingly capable โ it misses some subtle issues but nails the obvious ones.
The nice thing about using an OpenAI-compatible endpoint: switching models is a one-line config change. Start with Sonnet, try Opus on a few PRs, see if the extra cost is worth it for your codebase.
Handling Large PRs
The 30K character truncation works for most PRs, but sometimes you get a 2,000-line refactor. Here's a smarter approach โ review file by file instead of the whole diff at once:
def review_large_pr():
"""Review large PRs file-by-file."""
files = get_pr_files()
all_comments = []
for f in files:
# Skip non-code files
if f["filename"].endswith((".md", ".json", ".lock", ".svg")):
continue
# Skip files with no patch (binary, too large)
if "patch" not in f:
continue
patch = f["patch"]
if len(patch) < 20: # Skip trivial changes
continue
comments = review_diff(
f"File: {f['filename']}\n\n{patch}"
)
all_comments.extend(comments)
return all_comments
This costs more (one API call per file), but each review is more focused. The model has full context on each file's changes without being overwhelmed by a massive diff.
Making the Bot Smarter
The basic version works, but a few tweaks make it significantly more useful:
Add repo context to the prompt
Include a brief description of your project in the system prompt. "This is a Python FastAPI backend for a fintech app" helps the model flag domain-specific issues it would otherwise miss.
Filter by file type
Don't waste tokens reviewing auto-generated files, lockfiles, or migrations. Add a skip list:
SKIP_PATTERNS = [
"*.lock", "*.min.js", "*.generated.*",
"migrations/*", "vendor/*", "__snapshots__/*"
]
Rate limit awareness
If your team opens a lot of PRs, you might hit API rate limits. Add a concurrency limit to the workflow:
concurrency:
group: ai-review-${{ github.event.pull_request.number }}
cancel-in-progress: true
This cancels any in-progress review if the PR gets updated, so you're not paying for reviews on outdated code.
Custom review rules
Drop a .ai-review-rules file in your repo root with project-specific guidelines:
# .ai-review-rules
- All database queries must use parameterized statements
- API endpoints must validate request body with Pydantic models
- No print() statements in production code โ use the logger
- All public functions need docstrings
Then append these rules to the review prompt. Now the bot enforces your team's conventions, not just generic best practices.
What This Costs in Practice
Real numbers from a team of 8 developers, ~12 PRs per day, using Claude Sonnet 4.6:
- Average diff size: ~400 lines (roughly 8K tokens input)
- Average review output: ~500 tokens
- Cost per review: ~$0.02
- Monthly cost: ~$5.30
Five bucks a month for automated first-pass code review. That's less than a single coffee. Even if you use Opus for everything, you're looking at maybe $20/month โ still nothing compared to the engineering time it saves.
Get an API Key in 60 Seconds
KissAPI gives you one endpoint for Claude, GPT-5, DeepSeek, and 200+ models. Pay-as-you-go, no subscription. Works with the code in this tutorial out of the box.
Start Free โLimitations (Be Honest About Them)
This bot is a first-pass filter, not a replacement for human review. Things it's bad at:
- Architecture decisions. It can't tell you if a feature belongs in service A or service B.
- Business logic correctness. It doesn't know your domain rules unless you spell them out.
- Cross-file dependencies. It reviews the diff, not your entire codebase. It might miss that a function rename broke 3 other files.
- False positives. Expect some. The model occasionally flags correct code as problematic. Developers learn to calibrate quickly.
The goal isn't perfection. It's catching the 60% of issues that are mechanical โ the stuff that wastes a senior engineer's time to point out. Missing semicolons, unchecked error returns, that TODO someone left in production code three months ago.
Wrapping Up
You now have a working AI code review bot that:
- Triggers automatically on every PR
- Posts inline comments at the exact lines that need attention
- Works with any OpenAI-compatible model
- Costs less than a cup of coffee per month
The full code is about 100 lines of Python and 20 lines of YAML. Fork it, customize the prompt for your codebase, and let it run. Your reviewers will thank you.