Gemini Omni Flash API Guide (2026): Video Generation, Conversational Editing & Pricing

On June 30, 2026, Google finally put Gemini Omni Flash behind an API. That's the part that matters for developers. The model debuted to consumers at I/O 2026 back in May, but with no programmatic interface it was a prosumer toy, not something you could ship. Now it's live in the Gemini API, Google AI Studio, and the Gemini Enterprise Agent Platform, and it costs $0.10 per second of generated video.

Google shipped a second model the same day: Nano Banana 2 Lite (formally Gemini 3.1 Flash Lite Image), a cheaper, faster image generator. If you're building anything that touches generative media, this is the release to pay attention to this week. Here's what actually changed and how to wire it into real code.

TL;DR / Key Takeaways
  • Google released Gemini Omni Flash to developers through the Gemini API on June 30, 2026.
  • Gemini Omni Flash costs $0.10 per second of generated 720p video, so a 10-second clip costs about $1.00.
  • Gemini Omni Flash accepts text, image, and video inputs and returns a finished clip with synced audio in one call.
  • Nano Banana 2 Lite (Gemini 3.1 Flash Lite Image) launched the same day at roughly $0.25 per 1M input tokens and $1.50 per 1M output tokens.
  • Gemini Omni Flash's $0.10 per second price matches Google's Veo 3.1 Fast tier, but adds conversational, multi-turn editing.

What "Omni" Actually Means Here

Gemini Omni Flash is the first model in Google's new "Omni" family. The pitch is "anything from any input," and for now that means video from a mix of text, images, and existing clips. But the headline feature isn't a sharper text-to-video prompt. It's conversational editing.

Normally a generative video pipeline bolts together five tools: an LLM for the script, a text-to-image model, an image-to-video model, a lip-sync tool, and a voice generator. Each has its own billing and data path. Omni Flash collapses that into one model that takes your inputs and returns a clip with synced audio. Then you edit it by talking to it. Relight a product shot, reframe it, swap the wardrobe, translate on-screen signage, and each instruction builds on the last instead of regenerating from scratch.

Pricing: What You'll Actually Pay

The pricing landed with the API and it's aggressive. Video output is billed by the second, and inputs are billed at standard Gemini token rates.

ModelOutput PriceInput PriceContext / Notes
Gemini Omni Flash (video)$0.10 per second of 720p videoStandard Gemini token rates for text/image/video inputs~$1.00 per 10-second clip; synced audio included
Nano Banana 2 Lite (image)$1.50 per 1M output tokens$0.25 per 1M input tokensGemini 3.1 Flash Lite Image; built for high throughput

The mental model: budget video by wall-clock duration, not by tokens. A 30-second explainer runs about $3.00 in output cost before you count input tokens. Compare that to a traditional shoot with a crew and revision rounds, and the math changes what internal video is even worth making.

How It Stacks Up Against Veo 3.1 Fast

Google already sells video generation through Veo. Omni Flash matches Veo 3.1 Fast on price but changes the workflow.

AttributeGemini Omni FlashVeo 3.1 Fast
Output price$0.10 per second of video$0.10 per second of video
Core workflowConversational, multi-turn editing of an existing clipPrompt-to-video generation
InputsText, images, and existing video clips combinedText and image prompts
Best forIterative edits, brand asset insertion, localized on-screen textFast one-shot generation from a prompt
Key limitationSign tracking in complex scenes can slip; output still needs human reviewNo conversational refinement of a finished clip

If your team regenerates entire videos just to change one line of on-screen text, Omni Flash is the interesting one. If you just need fast clips from a prompt, Veo Fast is fine.

Minimal curl Call

The Gemini API exposes Omni Flash as gemini-omni-flash-preview. Here's a compact generation request. Set your key in GEMINI_API_KEY first.

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-omni-flash-preview:generateContent?key=$GEMINI_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "contents": [{
      "parts": [{
        "text": "A 6-second product shot of a matte-black water bottle on a wet stone counter, soft morning light, gentle camera push-in."
      }]
    }],
    "generationConfig": {
      "responseModalities": ["VIDEO"],
      "videoConfig": {"durationSeconds": 6, "resolution": "720p"}
    }
  }'

At $0.10 per second, that 6-second clip costs roughly $0.60 in output. Keep durations tight during iteration and only scale up once the composition is locked.

Python: Generate, Then Edit by Conversation

The pattern that saves money is generating a short draft, then editing it with follow-up instructions instead of regenerating. This keeps the parts that already work.

import os
from google import genai
from google.genai import types

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

# 1) Generate a short draft
draft = client.models.generate_content(
    model="gemini-omni-flash-preview",
    contents="A 5-second clip: a red sneaker rotating on a white pedestal, studio light.",
    config=types.GenerateContentConfig(
        response_modalities=["VIDEO"],
        video_config=types.VideoConfig(duration_seconds=5, resolution="720p"),
    ),
)

video_ref = draft.candidates[0].content.parts[0]

# 2) Edit conversationally: add rain and reflections, keep the rest
edit = client.models.generate_content(
    model="gemini-omni-flash-preview",
    contents=[
        video_ref,
        "Add light rain and wet-floor reflections. Keep the sneaker, angle, and lighting.",
    ],
    config=types.GenerateContentConfig(response_modalities=["VIDEO"]),
)

with open("edited.mp4", "wb") as f:
    f.write(edit.candidates[0].content.parts[0].inline_data.data)

Because each edit builds on the last clip, you don't pay to re-derive the whole scene. That's the practical cost lever with this model.

Node.js: Insert a Brand Asset by Reference

Omni Flash accepts reference images and carries their coloring and rough shape into the result, which is what makes brand insertion useful.

import { GoogleGenAI } from "@google/genai";
import { readFileSync, writeFileSync } from "node:fs";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const logo = readFileSync("brand-logo.png").toString("base64");

const res = await ai.models.generateContent({
  model: "gemini-omni-flash-preview",
  contents: [
    { inlineData: { mimeType: "image/png", data: logo } },
    { text: "8-second clip of a cafe counter. Place this logo on the takeaway cup, natural lighting, subtle motion." },
  ],
  config: {
    responseModalities: ["VIDEO"],
    videoConfig: { durationSeconds: 8, resolution: "720p" },
  },
});

const part = res.candidates[0].content.parts.find((p) => p.inlineData);
writeFileSync("cafe.mp4", Buffer.from(part.inlineData.data, "base64"));

One caveat worth repeating: the match isn't pixel-perfect, and text or logo tracking can drift between frames in busy scenes. Treat the output as a strong first cut, not a final deliverable. A human still signs off before it ships.

Where This Fits in a Real Stack

Video generation is bursty. You'll batch a dozen clips before a campaign, then go quiet for a week. That spikiness is exactly where a single hardcoded provider hurts, because a rate limit or a regional outage stalls the whole batch. Routing media and text calls through an OpenAI-compatible layer like KissAPI lets you keep one integration while swapping models underneath, and it gives you a fallback route when a provider throttles you mid-batch.

The other reason to abstract the endpoint: pricing on generative media is moving fast. Omni Flash matched Veo Fast on day one, and Nano Banana 2 Lite undercut the previous image tier. If your code talks to one fixed model name, every price change is a refactor. If it talks to a routing layer, it's a config edit.

Rule of thumb: during iteration, generate short 4-6 second drafts and refine with conversational edits. Only render full-length, higher-cost clips once the composition, text, and branding are locked. At $0.10 per second, that discipline is the difference between a $5 test loop and a $50 one.

FAQ

How much does Gemini Omni Flash cost per video?

Gemini Omni Flash costs $0.10 per second of generated 720p video, the same as Google's Veo 3.1 Fast tier. A 10-second clip costs about $1.00. Text, image, and video inputs are billed separately at standard Gemini token rates.

When was Gemini Omni Flash released to developers?

Google made Gemini Omni Flash available to developers through the Gemini API on June 30, 2026. It first debuted to consumers at Google I/O 2026 in May, but had no programmatic interface until the June 30 rollout.

What is the difference between Gemini Omni Flash and Nano Banana 2 Lite?

Gemini Omni Flash generates and edits video with synced audio at $0.10 per second. Nano Banana 2 Lite (Gemini 3.1 Flash Lite Image) generates still images at roughly $0.25 per 1M input tokens and $1.50 per 1M output tokens. Both went live for developers on June 30, 2026.

Route Video and Text Calls Through One Endpoint

Create a free account at api.kissapi.ai/register and keep an OpenAI-compatible route ready so a provider throttle never stalls your render batch.

Start Free