Gemini 3.1 Pro API Guide: Pricing, Setup & Code Examples (2026)

Published April 2, 2026 · 9 min read

Google's Gemini 3.1 Pro quietly became one of the most interesting models in the API market. It shipped in February 2026 with a 1-million-token context window, scored 77.1% on ARC-AGI-2 (higher than any other Pro-tier model at launch), and costs $2/$12 per million tokens. That's roughly half what GPT-5.4 charges for input.

If you haven't tried it yet, this guide covers everything: pricing, how to set it up with Google's SDK or any OpenAI-compatible client, code examples in Python and Node.js, and an honest look at where it shines versus where you're better off with Claude or GPT-5.4.

Gemini 3.1 Pro Pricing

Here's the full pricing table for the Gemini 3.x family as of April 2026:

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context Window
Gemini 3.1 Pro	$2.00	$12.00	1,000,000
Gemini 3.0 Flash	$0.15	$0.60	1,000,000
Gemini 2.5 Flash-Lite	$0.10	$0.40	1,000,000

For context, here's how that stacks up against the competition:

Model	Input	Output	Context
Gemini 3.1 Pro	$2.00	$12.00	1M
GPT-5.4	$2.50	$15.00	256K
Claude Sonnet 4.6	$3.00	$15.00	200K
Claude Opus 4.6	$15.00	$75.00	200K
DeepSeek V3.2	$0.27	$1.10	128K

The pricing tells a story. Gemini 3.1 Pro is 20% cheaper than GPT-5.4 on input and output, with 4x the context window. For input-heavy workloads — feeding large codebases, long documents, or conversation histories — that gap compounds fast.

What Makes Gemini 3.1 Pro Different

Numbers aside, there are a few things that set this model apart from the pack.

1M Token Context Window (For Real)

Other models claim large context windows but degrade badly past 100K tokens. Gemini 3.1 Pro was built around its 1M window. Google's "needle in a haystack" tests show near-perfect recall across the full range. That matters if you're doing things like:

Analyzing entire repositories in a single prompt
Processing long legal or financial documents without chunking
Maintaining very long conversation histories for agent workflows
Comparing multiple large files side by side

You can feed it a 500K-token codebase and ask "where's the bug in the authentication flow?" — and it'll actually find it. Try that with a 200K model and you're chunking, summarizing, and hoping nothing falls through the cracks.

Native Multimodal Input

Gemini 3.1 Pro handles text, images, audio, and video natively. Not as a bolted-on feature — it was trained on all modalities from the start. Image input costs about $0.0011 per image (560 tokens). You can mix a screenshot of a UI bug with your code and ask it to diagnose the issue.

Native Tool Use

The model has built-in support for function calling and tool use without special prompting tricks. It can generate and execute Python code, call external APIs through defined function schemas, and chain multiple tool calls in a single turn. If you're building agents, this matters.

Thinking Mode

Like Claude's extended thinking and OpenAI's o-series, Gemini 3.1 Pro supports a thinking mode where it reasons step-by-step before answering. Google calls these "thinking levels" — you can control how much reasoning budget the model gets. Useful for math, complex debugging, and architectural decisions.

Option 1: Google AI Studio (Direct)

The simplest way to get started. Go to aistudio.google.com, sign in with your Google account, and generate an API key. No credit card required for the free tier.

Install the Google Generative AI SDK:

pip install google-genai

Python example:

from google import genai

client = genai.Client(api_key="YOUR_GOOGLE_API_KEY")

response = client.models.generate_content(
    model="gemini-3.1-pro-preview",
    contents="Write a Python function that finds all 
    circular dependencies in a package's import graph."
)

print(response.text)

For streaming responses:

for chunk in client.models.generate_content_stream(
    model="gemini-3.1-pro-preview",
    contents="Explain how B-trees work, with a Python implementation."
):
    print(chunk.text, end="")

The direct Google SDK works fine, but it locks you into Google's API format. If you want to switch between Gemini, Claude, and GPT-5.4 without rewriting your code, read on.

Option 2: OpenAI-Compatible Endpoint

Most developers already have code written against the OpenAI API format. The good news: you can access Gemini 3.1 Pro through OpenAI-compatible gateways without changing your existing code. Just swap the base URL and model name.

Python (using the OpenAI SDK):

from openai import OpenAI

client = OpenAI(
    api_key="your-gateway-api-key",
    base_url="https://api.kissapi.ai/v1"
)

response = client.chat.completions.create(
    model="gemini-3.1-pro",
    messages=[
        {"role": "system", "content": "You are a senior backend engineer."},
        {"role": "user", "content": "Review this SQL query for N+1 problems and suggest fixes."}
    ],
    temperature=0.7
)

print(response.choices[0].message.content)

Node.js:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "your-gateway-api-key",
  baseURL: "https://api.kissapi.ai/v1",
});

const response = await client.chat.completions.create({
  model: "gemini-3.1-pro",
  messages: [
    { role: "user", content: "Convert this REST API to GraphQL. Here's the OpenAPI spec: ..." }
  ],
});

console.log(response.choices[0].message.content);

curl:

curl https://api.kissapi.ai/v1/chat/completions \
  -H "Authorization: Bearer your-gateway-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3.1-pro",
    "messages": [
      {"role": "user", "content": "Explain consistent hashing with a Go implementation."}
    ]
  }'

The advantage here is obvious: same code, same SDK, and you can switch to claude-sonnet-4-6 or gpt-5.4 by changing one string. No SDK swaps, no format changes.

Using Gemini 3.1 Pro with Developer Tools

Cursor IDE

Cursor supports custom OpenAI-compatible endpoints. Go to Settings → Models → Add custom model. Enter your gateway URL as the base URL, paste your API key, and set the model name to gemini-3.1-pro. Now you can use Gemini for code completion, chat, and inline edits.

Cline (VS Code)

In Cline's settings, select "OpenAI Compatible" as the provider. Set the base URL to your gateway endpoint and enter the API key. Cline will use Gemini 3.1 Pro for its agentic coding workflow — file edits, terminal commands, the whole thing.

Claude Code / Codex CLI

Both Claude Code and OpenAI's Codex CLI support custom endpoints through environment variables:

# For Claude Code (set to OpenAI-compatible gateway)
export ANTHROPIC_BASE_URL=https://api.kissapi.ai
export ANTHROPIC_API_KEY=your-key

# For Codex CLI
export OPENAI_BASE_URL=https://api.kissapi.ai/v1
export OPENAI_API_KEY=your-key

Note: Claude Code expects Anthropic-format responses by default. To use Gemini through Claude Code, you'll need a gateway that translates between formats, or use Codex CLI which natively speaks the OpenAI format.

When to Use Gemini 3.1 Pro (and When Not To)

No model is best at everything. Here's where Gemini 3.1 Pro earns its keep — and where you should reach for something else.

Gemini 3.1 Pro Wins

Long-context tasks: analyzing entire codebases, long documents, multi-file diffs. The 1M window is a genuine advantage, not a marketing number.
Multimodal workflows: mixing images, audio, or video with text prompts. Screenshot-to-code, diagram analysis, video summarization.
Cost-sensitive production workloads: at $2/$12, it's the cheapest frontier-tier model. If you're processing millions of tokens daily, the savings add up.
Agent tool use: native function calling is clean and reliable. Less prompt engineering needed compared to some competitors.

Consider Alternatives

Pure coding tasks: Claude Sonnet 4.6 still edges out Gemini on SWE-bench and real-world code generation. If coding is your primary use case and context length isn't a bottleneck, Claude is hard to beat.
Creative writing: GPT-5.4 tends to produce more natural, varied prose. Gemini can feel a bit clinical in long-form writing.
Budget-constrained simple tasks: DeepSeek V3.2 at $0.27/$1.10 or Gemini Flash at $0.15/$0.60 are dramatically cheaper for classification, extraction, and formatting.

Practical Tips for Gemini 3.1 Pro

Use the full context window strategically. Just because you can send 1M tokens doesn't mean you should. More context = more cost. Send what's relevant. But when you genuinely need to analyze a large codebase or document set, don't chunk it — let the model see everything at once.
Enable thinking mode for hard problems. For straightforward tasks, skip it — it adds latency and token cost. For debugging, architecture decisions, or math, the reasoning step pays for itself in accuracy.
Combine with cheaper models. Use Gemini 3.1 Pro for the heavy lifting (analysis, complex generation) and route simple tasks to Flash or Flash-Lite. A model router that picks the right model per request can cut your bill by 60-70%.
Watch the output token ratio. At $12/M output tokens, verbose responses get expensive. Set max_tokens appropriately and use system prompts that encourage concise answers when you don't need essays.
Use streaming for interactive apps. Gemini 3.1 Pro's time-to-first-token is competitive. Streaming gives your users something to read while the model works, and lets you cancel early if the response goes sideways.

Gemini 3.1 Pro vs GPT-5.4 vs Claude Sonnet 4.6: Quick Comparison

Feature	Gemini 3.1 Pro	GPT-5.4	Claude Sonnet 4.6
Input price	$2.00/M	$2.50/M	$3.00/M
Output price	$12.00/M	$15.00/M	$15.00/M
Context window	1,000,000	256,000	200,000
Multimodal	Text, image, audio, video	Text, image, audio	Text, image
Thinking mode	Yes (levels)	Yes (o-series)	Yes (extended)
Native tool use	Yes	Yes	Yes
Code generation	Strong	Strong	Strongest
Long-context recall	Strongest	Good	Good

The honest take: there's no single "best" model anymore. Gemini 3.1 Pro is the best value for long-context and multimodal work. Claude leads on coding. GPT-5.4 is the safest all-rounder. Smart developers use all three and route by task.

Try Gemini 3.1 Pro API Today

Access Gemini 3.1 Pro, Claude, GPT-5.4, and 200+ models through one OpenAI-compatible API. Pay-as-you-go, no subscription.

Start Free →

FAQ

Is Gemini 3.1 Pro still in preview?

As of April 2026, the model ID is gemini-3.1-pro-preview. Google hasn't announced a stable release date yet, but the preview has been reliable in production for most use cases. Rate limits are slightly lower than stable models.

Can I use Gemini 3.1 Pro through the OpenAI SDK?

Not directly — Google uses a different API format. But through an OpenAI-compatible gateway like KissAPI, you can use the standard OpenAI Python or Node.js SDK with Gemini. Just change the base URL and model name.

How does the 1M context window affect pricing?

You pay per token, so a 1M-token prompt costs $2.00 for input alone. The context window is a maximum, not a minimum. For most requests, you'll use far less. The advantage is that you can go big when you need to, without chunking or summarization hacks.

Does Gemini 3.1 Pro support function calling?

Yes, natively. You define function schemas in your request, and the model will generate structured function calls when appropriate. It's one of the cleanest function-calling implementations available — less hallucinated parameters than some competitors.