How to Build an AI Agent with Claude API in Python

Published February 23, 2026 · 10 min read

Everyone's talking about AI agents. But most tutorials either hand-wave over the hard parts or dump a 500-line framework on you without explaining what's actually happening. Let's fix that.

In this guide, we'll build a real AI agent from scratch using Claude Sonnet 4.6 and Python. No frameworks. No LangChain. Just the OpenAI-compatible API, a loop, and some tools. By the end, you'll have a working agent that can search the web, read files, and make decisions on its own.

What Makes an Agent Different from a Chatbot?

A chatbot takes input, generates output. Done. An agent does something fundamentally different: it decides what to do next. It has a goal, it picks tools, it executes actions, and it loops until the job is done.

The core pattern is dead simple:

Send a message to the LLM with a list of available tools
The LLM either responds with text (done) or requests a tool call
You execute the tool and send the result back
Repeat until the LLM says it's finished

That's it. That's the agent loop. Everything else — memory, planning, multi-agent orchestration — is built on top of this.

Prerequisites

You'll need Python 3.10+ and the OpenAI SDK. We're using the OpenAI SDK because Claude's API is OpenAI-compatible through gateways, which means the same code works with GPT-5, Gemini, or any other model.

pip install openai requests

You'll also need an API key. If you don't have direct Anthropic access, services like KissAPI give you an OpenAI-compatible endpoint that routes to Claude — same models, same capabilities, easier setup.

Step 1: Define Your Tools

Tools are just functions with a JSON schema that tells the model what arguments they accept. Let's create two practical tools: one that searches the web and one that reads local files.

import json
import os
import requests

def web_search(query: str) -> str:
    """Search the web and return top results."""
    # Using a simple search API (replace with your preferred provider)
    resp = requests.get(
        "https://api.search.brave.com/res/v1/web/search",
        headers={"X-Subscription-Token": os.environ["BRAVE_API_KEY"]},
        params={"q": query, "count": 5}
    )
    results = resp.json().get("web", {}).get("results", [])
    return "\n".join(
        f"- {r['title']}: {r['description']}" for r in results
    )

def read_file(path: str) -> str:
    """Read a local file and return its contents."""
    try:
        with open(path, "r") as f:
            content = f.read(10000)  # cap at 10k chars
        return content
    except FileNotFoundError:
        return f"Error: File '{path}' not found."
    except Exception as e:
        return f"Error reading file: {e}"

Nothing fancy. Real functions that do real things. Now we need to describe them in the format the API expects:

tools = [
    {
        "type": "function",
        "function": {
            "name": "web_search",
            "description": "Search the web for current information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query"
                    }
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read contents of a local file",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {
                        "type": "string",
                        "description": "Path to the file"
                    }
                },
                "required": ["path"]
            }
        }
    }
]

# Map function names to actual functions
tool_map = {
    "web_search": web_search,
    "read_file": read_file,
}

Step 2: Build the Agent Loop

Here's where it gets interesting. The agent loop is the heart of the whole thing. It sends messages to Claude, checks if the model wants to call a tool, executes it, and feeds the result back.

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["API_KEY"],
    base_url="https://api.kissapi.ai/v1"  # or your preferred endpoint
)

def run_agent(user_message: str, max_turns: int = 10) -> str:
    messages = [
        {
            "role": "system",
            "content": (
                "You are a helpful research assistant. "
                "Use the available tools to find information and answer questions. "
                "Always verify claims with web searches when possible. "
                "Think step by step before acting."
            )
        },
        {"role": "user", "content": user_message}
    ]

    for turn in range(max_turns):
        response = client.chat.completions.create(
            model="claude-sonnet-4-6",
            messages=messages,
            tools=tools,
            tool_choice="auto"
        )

        msg = response.choices[0].message

        # If no tool calls, we're done
        if not msg.tool_calls:
            return msg.content

        # Add the assistant's message (with tool calls) to history
        messages.append(msg)

        # Execute each tool call
        for tool_call in msg.tool_calls:
            fn_name = tool_call.function.name
            fn_args = json.loads(tool_call.function.arguments)

            print(f"  → Calling {fn_name}({fn_args})")

            # Execute the function
            result = tool_map[fn_name](**fn_args)

            # Add tool result to messages
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": str(result)
            })

    return "Agent reached maximum turns without completing."

That's a working agent in about 40 lines. No framework needed. Let's test it:

answer = run_agent("What's the latest news about Claude Sonnet 4.6?")
print(answer)

The model will search the web, read the results, and synthesize an answer. If it needs more info, it'll search again. The loop handles everything.

Step 3: Add Conversation Memory

Our agent forgets everything between calls. For a one-shot research task, that's fine. But if you want a persistent assistant, you need memory. The simplest approach: keep the message history and persist it.

class Agent:
    def __init__(self, model="claude-sonnet-4-6"):
        self.model = model
        self.messages = [
            {
                "role": "system",
                "content": "You are a helpful assistant with access to tools. "
                           "Use them when needed. Be concise."
            }
        ]
        self.client = OpenAI(
            api_key=os.environ["API_KEY"],
            base_url="https://api.kissapi.ai/v1"
        )

    def chat(self, user_input: str) -> str:
        self.messages.append({"role": "user", "content": user_input})

        for _ in range(10):  # max tool-use turns
            response = self.client.chat.completions.create(
                model=self.model,
                messages=self.messages,
                tools=tools,
                tool_choice="auto"
            )

            msg = response.choices[0].message

            if not msg.tool_calls:
                self.messages.append(msg)
                return msg.content

            self.messages.append(msg)
            for tc in msg.tool_calls:
                fn = tool_map[tc.function.name]
                args = json.loads(tc.function.arguments)
                result = fn(**args)
                self.messages.append({
                    "role": "tool",
                    "tool_call_id": tc.id,
                    "content": str(result)
                })

        return "Max turns reached."

    def save(self, path="agent_memory.json"):
        with open(path, "w") as f:
            json.dump(self.messages, f, default=str)

    def load(self, path="agent_memory.json"):
        if os.path.exists(path):
            with open(path) as f:
                self.messages = json.load(f)

Now you can have multi-turn conversations:

agent = Agent()
agent.load()  # resume previous conversation if exists

print(agent.chat("Find the pricing for Claude Sonnet 4.6"))
print(agent.chat("How does that compare to GPT-5?"))

agent.save()  # persist for next time

Step 4: Add Error Handling (Don't Skip This)

Production agents need to handle failures gracefully. API calls fail. Tools throw exceptions. Rate limits hit. Here's the minimum you should add:

import time

def safe_tool_call(fn, args, retries=2):
    """Execute a tool with retry logic."""
    for attempt in range(retries + 1):
        try:
            return fn(**args)
        except Exception as e:
            if attempt == retries:
                return f"Tool error after {retries + 1} attempts: {e}"
            time.sleep(1 * (attempt + 1))  # simple backoff

def run_agent_safe(user_message: str, max_turns=10) -> str:
    messages = [
        {"role": "system", "content": "You are a research assistant."},
        {"role": "user", "content": user_message}
    ]

    for turn in range(max_turns):
        try:
            response = client.chat.completions.create(
                model="claude-sonnet-4-6",
                messages=messages,
                tools=tools,
                tool_choice="auto",
                timeout=30
            )
        except Exception as e:
            # Retry once on API error
            time.sleep(2)
            try:
                response = client.chat.completions.create(
                    model="claude-sonnet-4-6",
                    messages=messages,
                    tools=tools,
                    tool_choice="auto",
                    timeout=30
                )
            except Exception as e2:
                return f"API error: {e2}"

        msg = response.choices[0].message

        if not msg.tool_calls:
            return msg.content

        messages.append(msg)
        for tc in msg.tool_calls:
            fn = tool_map.get(tc.function.name)
            if not fn:
                result = f"Unknown tool: {tc.function.name}"
            else:
                args = json.loads(tc.function.arguments)
                result = safe_tool_call(fn, args)

            messages.append({
                "role": "tool",
                "tool_call_id": tc.id,
                "content": str(result)
            })

    return "Max turns reached."

Choosing the Right Model for Your Agent

Not every agent task needs the same model. Here's a practical breakdown:

Use Case	Recommended Model	Why
Complex research, multi-step reasoning	Claude Opus 4.6	Best accuracy on hard tasks
General-purpose agent, coding tasks	Claude Sonnet 4.6	Near-Opus quality, 5x cheaper
Simple tool routing, classification	Claude Haiku 4.5	Fast and cheap for simple decisions
High-volume production agents	Sonnet 4.6 + Haiku fallback	Use Haiku for easy turns, Sonnet for hard ones

Sonnet 4.6 is the sweet spot for most agent workloads. It just dropped last week and benchmarks show it matching Opus 4.5 on most tasks — at Sonnet pricing. That's a big deal for agents, where you're making many API calls per task.

Common Pitfalls

After building a few agents, here's what trips people up:

Unbounded loops. Always set a max_turns limit. A confused model can loop forever calling the same tool.
Giant context windows. Just because Claude supports 200K tokens doesn't mean you should use them all. Trim old messages. Summarize when the history gets long.
No tool result validation. If a tool returns garbage, the model will try to work with garbage. Validate tool outputs before sending them back.
Ignoring costs. An agent that makes 15 tool calls per query at Opus pricing adds up fast. Monitor your token usage.

Where to Go from Here

What we built is a solid foundation. From here, you could add:

More tools — database queries, API calls, code execution, file writing
Planning — have the model outline a plan before executing, then follow it step by step
Multi-agent — spawn sub-agents for specific tasks (one researches, one writes, one reviews)
Streaming — stream responses so users see progress in real-time

The beauty of building from scratch is you understand every piece. When something breaks — and it will — you know exactly where to look.

Get Your API Key in 30 Seconds

KissAPI gives you access to Claude Sonnet 4.6, Opus 4.6, GPT-5, and more through one OpenAI-compatible endpoint. $1 free credits to start.

Start Free →