Gemini 3 Pro Image API Cost Routing Guide (2026): Pro vs Flash for Developers
On June 18, 2026, API provider listings started showing two new Google image models side by side: Gemini 3 Pro Image, also labeled Nano Banana Pro, and Gemini 3.1 Flash Image, also labeled Nano Banana 2. Price trackers list Pro around $2.00 per million input tokens and $12.00 per million output tokens, while Flash is listed around $0.50 input and $3.00 output, with larger context on the Flash side.
That split matters. Image generation is no longer a single-model decision. If every request goes to the prettiest model, your bill balloons. If every request goes to the cheapest model, your product looks cheap. The useful answer is routing: send the expensive jobs to Pro, send drafts and bulk work to Flash, and measure the retry tax.
This guide is for developers building image generation into products, not people making one-off social posts. We'll cover routing rules, cost controls, request patterns, and a few production mistakes that quietly eat budget.
The News Hook: Two Image Models, Two Different Jobs
The June 18 listings are interesting because they put the tradeoff in plain sight. Gemini 3 Pro Image is the premium path. It is the model you reach for when detail, composition, and brand quality matter. Gemini 3.1 Flash Image is the throughput path. It is cheaper, faster to experiment with, and better suited to many small generations.
| Model | Best use | Listed API pricing | Routing stance |
|---|---|---|---|
| Gemini 3 Pro Image | Final assets, ads, app store shots, polished hero images | ~$2 input / ~$12 output per 1M tokens | Escalate only when quality matters |
| Gemini 3.1 Flash Image | Drafts, variants, thumbnails, internal previews | ~$0.50 input / ~$3 output per 1M tokens | Default for exploration |
Pricing pages can change, so don't hard-code these numbers into dashboards. Pull the latest rate card into config and keep a manual override. But the ratio is the key point: Pro costs roughly four times Flash in those listings. One unnecessary retry on Pro can cost more than several Flash drafts.
A Practical Routing Policy
I like a three-stage policy because it maps to how humans actually design images:
- Draft: Use Flash for the first prompt, rough layouts, and multiple composition options.
- Select: Let the user or your ranking logic pick one direction.
- Polish: Use Pro only for the final pass, high-resolution output, or brand-sensitive work.
Defaulting to Pro is a lazy architecture. Defaulting to Flash and escalating with intent is the better production pattern.
This also improves user experience. People rarely know exactly what they want on the first attempt. Give them cheap exploration, then spend serious tokens when the direction is clear.
OpenAI-Compatible Request Shape
If your gateway exposes image models through an OpenAI-compatible interface, keep the call simple. Here is a curl pattern you can adapt:
curl https://api.kissapi.ai/v1/images/generations \
-H "Authorization: Bearer $KISSAPI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-3.1-flash-image",
"prompt": "Dark SaaS dashboard showing API cost routing for image models",
"size": "1536x864",
"quality": "medium"
}'
Then reserve the Pro model for the final pass:
curl https://api.kissapi.ai/v1/images/generations \
-H "Authorization: Bearer $KISSAPI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-3-pro-image",
"prompt": "Final polished hero image, dark SaaS dashboard, neon purple and cyan accents, clean typography, production-ready marketing asset",
"size": "1536x864",
"quality": "high"
}'
KissAPI is useful here because the rest of your app can keep one API shape while you switch models behind a routing rule. That matters more than it sounds. Cost control gets much easier when routing is config, not a product rewrite.
Python: Route by Job Type
Start with explicit job types. Don't ask the model to decide which model to use. Your application already knows whether a request is a draft, thumbnail, or final asset.
import os
import requests
API_URL = "https://api.kissapi.ai/v1/images/generations"
API_KEY = os.environ["KISSAPI_API_KEY"]
MODEL_BY_JOB = {
"draft": "gemini-3.1-flash-image",
"variant": "gemini-3.1-flash-image",
"thumbnail": "gemini-3.1-flash-image",
"final": "gemini-3-pro-image",
"brand_asset": "gemini-3-pro-image",
}
def generate_image(prompt: str, job_type: str, size="1536x864"):
model = MODEL_BY_JOB.get(job_type, "gemini-3.1-flash-image")
response = requests.post(
API_URL,
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
},
json={
"model": model,
"prompt": prompt,
"size": size,
"quality": "high" if model.endswith("pro-image") else "medium",
},
timeout=90,
)
response.raise_for_status()
return response.json()
The boring dictionary is the point. You can audit it. You can test it. You can change it without retraining anyone.
Node.js: Add a Retry Budget
Retries are where image costs get ugly. A text model retry may be cheap. A premium image retry can be the whole margin on a low-price product tier.
const endpoint = "https://api.kissapi.ai/v1/images/generations";
const routeModel = (jobType) => {
if (["final", "brand_asset", "paid_ad"].includes(jobType)) {
return "gemini-3-pro-image";
}
return "gemini-3.1-flash-image";
};
async function generateImage({ prompt, jobType, maxRetries = 1 }) {
const model = routeModel(jobType);
let lastError;
for (let attempt = 0; attempt <= maxRetries; attempt++) {
const res = await fetch(endpoint, {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.KISSAPI_API_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
model,
prompt,
size: "1536x864",
quality: model.includes("pro") ? "high" : "medium"
})
});
if (res.ok) return await res.json();
lastError = await res.text();
if (model.includes("pro") && attempt >= 0) break;
}
throw new Error(`Image generation failed: ${lastError}`);
}
Notice the rule: don't freely retry Pro. If the user asked for a paid ad or a final hero asset, show a clear failure and let them retry intentionally. Silent retries feel nice until finance reads the invoice.
Cost Controls That Actually Work
- Cache prompt hashes. If the exact same user prompt appears again, reuse the previous asset or ask whether they want a variation.
- Store generated assets. Don't regenerate images because your app forgot the URL.
- Separate drafts from finals. A draft button and a final-render button should not call the same model.
- Limit batch size. Four variants is usually enough. Twelve variants is usually a product manager hiding indecision in your API bill.
- Track retry cost by model. Retry rate matters more than list price once you run real traffic.
What to Measure
At minimum, log these fields for every image request:
| Field | Why it matters |
|---|---|
job_type | Shows whether routing matches product intent |
model | Lets you compare Pro vs Flash spend |
attempt_count | Surfaces hidden retry tax |
prompt_hash | Finds duplicate generations |
accepted_by_user | Tells you whether expensive generations are actually better |
The last one is underrated. If Pro assets aren't accepted at a higher rate than Flash assets in a given workflow, stop paying for Pro there.
Where KissAPI Fits
Most teams don't want to rewrite their image pipeline every time Google, OpenAI, or another provider changes a model name. A unified endpoint helps you keep the application stable while experimenting with the model layer. With KissAPI, you can keep one OpenAI-compatible integration, route image jobs by config, and pair that with tools like the API cost calculator and token counter before shipping a new generation feature.
FAQ
What is Gemini 3 Pro Image best for?
Use it for high-value output: product hero images, app store screenshots, paid ad creatives, brand campaigns, and detailed visual work where a better first result reduces manual editing.
When should I use Gemini 3.1 Flash Image?
Use Flash for drafts, thumbnails, bulk variations, internal previews, and early prompt exploration. It is the sensible default when the user has not committed to a direction yet.
Should I expose both models directly to users?
Usually no. Expose product-level actions like Draft, More Variants, and Final Render. Then route those actions to the right model internally.
Build Image Features Without Guessing the Bill
Create a free account at kissapi.ai/register and route image, chat, and coding models behind one OpenAI-compatible API.
Start Free