Build an AI Code Review Bot with GitHub Actions in 2026

Published March 6, 2026 · 9 min read

Code review is one of those things everyone agrees is important and nobody has enough time for. Your team opens 15 PRs a day, reviewers are backlogged, and half the comments end up being about formatting anyway.

What if a bot handled the first pass? Not replacing human reviewers — catching the obvious stuff so humans can focus on architecture, logic, and design decisions that actually need a brain.

This tutorial walks through building an AI code review bot that runs on every pull request via GitHub Actions. It reads the diff, sends it to an AI model (Claude, GPT-5, or whatever you prefer), and posts inline comments directly on the PR. The whole thing takes about 30 minutes to set up and costs pennies per review.

What the Bot Actually Does

Here's the flow:

A developer opens or updates a pull request
GitHub Actions triggers the workflow
A Python script fetches the PR diff via the GitHub API
The diff gets sent to an AI model with a code review prompt
The model returns structured feedback (file, line, comment)
The script posts each comment as an inline review on the PR

The bot catches things like: unused variables, potential null pointer issues, missing error handling, security red flags (hardcoded secrets, SQL injection patterns), and style inconsistencies. It won't catch everything a senior engineer would, but it catches enough to be worth running.

Prerequisites

A GitHub repository (public or private)
An API key for Claude, GPT-5, or any OpenAI-compatible endpoint
Basic familiarity with GitHub Actions and Python

Step 1: The Review Script

Create a file at .github/scripts/ai_review.py in your repo. This is the core logic — it fetches the diff, calls the AI, and posts comments.

import os
import json
import requests
from openai import OpenAI

# Config from environment
API_KEY = os.environ["AI_API_KEY"]
API_BASE = os.environ.get("AI_API_BASE", "https://api.openai.com/v1")
MODEL = os.environ.get("AI_MODEL", "claude-sonnet-4-6")
GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]
REPO = os.environ["GITHUB_REPOSITORY"]
PR_NUMBER = os.environ["PR_NUMBER"]

client = OpenAI(api_key=API_KEY, base_url=API_BASE)

def get_pr_diff():
    """Fetch the PR diff from GitHub API."""
    url = f"https://api.github.com/repos/{REPO}/pulls/{PR_NUMBER}"
    headers = {
        "Authorization": f"token {GITHUB_TOKEN}",
        "Accept": "application/vnd.github.v3.diff"
    }
    resp = requests.get(url, headers=headers)
    resp.raise_for_status()
    return resp.text

def get_pr_files():
    """Get list of changed files with patch info."""
    url = f"https://api.github.com/repos/{REPO}/pulls/{PR_NUMBER}/files"
    headers = {"Authorization": f"token {GITHUB_TOKEN}"}
    resp = requests.get(url, headers=headers)
    resp.raise_for_status()
    return resp.json()

def review_diff(diff_text):
    """Send diff to AI model for review."""
    prompt = """You are a senior code reviewer. Review this pull request diff and identify:
1. Bugs or potential runtime errors
2. Security issues (hardcoded secrets, injection, etc.)
3. Missing error handling
4. Performance concerns
5. Code style issues that affect readability

For each issue, respond with a JSON array of objects:
{
  "file": "path/to/file.py",
  "line": 42,
  "severity": "error" | "warning" | "suggestion",
  "comment": "Your review comment here"
}

Only flag real issues. Don't nitpick formatting if it's consistent.
If the code looks good, return an empty array: []

PR Diff:
"""
    response = client.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": "You are a code reviewer. Respond only with valid JSON."},
            {"role": "user", "content": prompt + diff_text}
        ],
        temperature=0.1,
        max_tokens=4000
    )

    content = response.choices[0].message.content.strip()
    # Handle markdown code blocks in response
    if content.startswith("```"):
        content = content.split("\n", 1)[1].rsplit("```", 1)[0]
    return json.loads(content)

def post_review_comments(comments):
    """Post inline comments on the PR."""
    if not comments:
        # Post a simple approval comment
        url = f"https://api.github.com/repos/{REPO}/pulls/{PR_NUMBER}/reviews"
        headers = {"Authorization": f"token {GITHUB_TOKEN}"}
        data = {
            "body": "🤖 AI Review: Code looks good. No issues found.",
            "event": "COMMENT"
        }
        requests.post(url, headers=headers, json=data)
        return

    # Build review with inline comments
    pr_files = get_pr_files()
    valid_comments = []

    for c in comments:
        # Verify the file exists in the PR
        matching = [f for f in pr_files if f["filename"] == c["file"]]
        if not matching:
            continue

        severity_emoji = {
            "error": "🔴",
            "warning": "🟡",
            "suggestion": "💡"
        }.get(c.get("severity", "suggestion"), "💡")

        valid_comments.append({
            "path": c["file"],
            "line": c["line"],
            "body": f"{severity_emoji} **{c.get('severity', 'suggestion').title()}**: {c['comment']}"
        })

    if not valid_comments:
        return

    url = f"https://api.github.com/repos/{REPO}/pulls/{PR_NUMBER}/reviews"
    headers = {"Authorization": f"token {GITHUB_TOKEN}"}
    data = {
        "body": f"🤖 AI Review: Found {len(valid_comments)} issue(s) to look at.",
        "event": "COMMENT",
        "comments": valid_comments
    }
    resp = requests.post(url, headers=headers, json=data)
    if resp.status_code != 200:
        print(f"Failed to post review: {resp.status_code} {resp.text}")
    else:
        print(f"Posted review with {len(valid_comments)} comments")

def main():
    print(f"Reviewing PR #{PR_NUMBER} in {REPO}")
    diff = get_pr_diff()

    # Truncate very large diffs to stay within token limits
    max_chars = 30000
    if len(diff) > max_chars:
        diff = diff[:max_chars] + "\n... (diff truncated)"
        print(f"Diff truncated from {len(diff)} to {max_chars} chars")

    comments = review_diff(diff)
    print(f"AI found {len(comments)} issues")
    post_review_comments(comments)

if __name__ == "__main__":
    main()

A few things worth noting in this script:

The temperature=0.1 keeps the model focused and consistent. You don't want creative code reviews.
The diff gets truncated at 30K characters. Most PRs are well under this, but monster PRs with generated files can blow past token limits.
The script validates that each commented file actually exists in the PR diff. GitHub's API rejects comments on files that aren't part of the PR.

Step 2: The GitHub Actions Workflow

Create .github/workflows/ai-review.yml:

name: AI Code Review

on:
  pull_request:
    types: [opened, synchronize]

permissions:
  contents: read
  pull-requests: write

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install dependencies
        run: pip install openai requests

      - name: Run AI Review
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          AI_API_KEY: ${{ secrets.AI_API_KEY }}
          AI_API_BASE: ${{ secrets.AI_API_BASE }}
          AI_MODEL: ${{ vars.AI_MODEL || 'claude-sonnet-4-6' }}
          PR_NUMBER: ${{ github.event.pull_request.number }}
        run: python .github/scripts/ai_review.py

Two secrets to configure in your repo settings (Settings → Secrets and variables → Actions):

AI_API_KEY — your API key
AI_API_BASE — the API endpoint URL (e.g., https://api.kissapi.ai/v1)

The AI_MODEL is set as a variable (not a secret) so you can change it without re-entering secrets. Defaults to Claude Sonnet 4.6, which hits the sweet spot of quality and cost for code review.

Step 3: Choosing the Right Model

Not all models are equal at code review. Here's what I've found after running this on several repos:

Model	Review Quality	Cost per Review*	Speed
Claude Opus 4.6	Excellent	~$0.08	8-15s
Claude Sonnet 4.6	Very Good	~$0.02	3-6s
GPT-5	Very Good	~$0.05	4-8s
DeepSeek V3	Good	~$0.005	5-10s

*Estimated cost for a typical 500-line PR diff.

Claude Sonnet 4.6 is my default recommendation. It catches most of what Opus catches at a quarter of the cost. For open-source projects with tight budgets, DeepSeek V3 is surprisingly capable — it misses some subtle issues but nails the obvious ones.

The nice thing about using an OpenAI-compatible endpoint: switching models is a one-line config change. Start with Sonnet, try Opus on a few PRs, see if the extra cost is worth it for your codebase.

Handling Large PRs

The 30K character truncation works for most PRs, but sometimes you get a 2,000-line refactor. Here's a smarter approach — review file by file instead of the whole diff at once:

def review_large_pr():
    """Review large PRs file-by-file."""
    files = get_pr_files()
    all_comments = []

    for f in files:
        # Skip non-code files
        if f["filename"].endswith((".md", ".json", ".lock", ".svg")):
            continue
        # Skip files with no patch (binary, too large)
        if "patch" not in f:
            continue

        patch = f["patch"]
        if len(patch) < 20:  # Skip trivial changes
            continue

        comments = review_diff(
            f"File: {f['filename']}\n\n{patch}"
        )
        all_comments.extend(comments)

    return all_comments

This costs more (one API call per file), but each review is more focused. The model has full context on each file's changes without being overwhelmed by a massive diff.

Making the Bot Smarter

The basic version works, but a few tweaks make it significantly more useful:

Add repo context to the prompt

Include a brief description of your project in the system prompt. "This is a Python FastAPI backend for a fintech app" helps the model flag domain-specific issues it would otherwise miss.

Filter by file type

Don't waste tokens reviewing auto-generated files, lockfiles, or migrations. Add a skip list:

SKIP_PATTERNS = [
    "*.lock", "*.min.js", "*.generated.*",
    "migrations/*", "vendor/*", "__snapshots__/*"
]

Rate limit awareness

If your team opens a lot of PRs, you might hit API rate limits. Add a concurrency limit to the workflow:

concurrency:
  group: ai-review-${{ github.event.pull_request.number }}
  cancel-in-progress: true

This cancels any in-progress review if the PR gets updated, so you're not paying for reviews on outdated code.

Custom review rules

Drop a .ai-review-rules file in your repo root with project-specific guidelines:

# .ai-review-rules
- All database queries must use parameterized statements
- API endpoints must validate request body with Pydantic models
- No print() statements in production code — use the logger
- All public functions need docstrings

Then append these rules to the review prompt. Now the bot enforces your team's conventions, not just generic best practices.

What This Costs in Practice

Real numbers from a team of 8 developers, ~12 PRs per day, using Claude Sonnet 4.6:

Average diff size: ~400 lines (roughly 8K tokens input)
Average review output: ~500 tokens
Cost per review: ~$0.02
Monthly cost: ~$5.30

Five bucks a month for automated first-pass code review. That's less than a single coffee. Even if you use Opus for everything, you're looking at maybe $20/month — still nothing compared to the engineering time it saves.

Get an API Key in 60 Seconds

KissAPI gives you one endpoint for Claude, GPT-5, DeepSeek, and 200+ models. Pay-as-you-go, no subscription. Works with the code in this tutorial out of the box.

Start Free →

Limitations (Be Honest About Them)

This bot is a first-pass filter, not a replacement for human review. Things it's bad at:

Architecture decisions. It can't tell you if a feature belongs in service A or service B.
Business logic correctness. It doesn't know your domain rules unless you spell them out.
Cross-file dependencies. It reviews the diff, not your entire codebase. It might miss that a function rename broke 3 other files.
False positives. Expect some. The model occasionally flags correct code as problematic. Developers learn to calibrate quickly.

The goal isn't perfection. It's catching the 60% of issues that are mechanical — the stuff that wastes a senior engineer's time to point out. Missing semicolons, unchecked error returns, that TODO someone left in production code three months ago.

Wrapping Up

You now have a working AI code review bot that:

Triggers automatically on every PR
Posts inline comments at the exact lines that need attention
Works with any OpenAI-compatible model
Costs less than a cup of coffee per month

The full code is about 100 lines of Python and 20 lines of YAML. Fork it, customize the prompt for your codebase, and let it run. Your reviewers will thank you.