Google ships Nano Banana 2 Lite and Gemini Omni Flash to developers

The new Gemini media models put faster image generation and conversational video editing into AI Studio, the Gemini API and enterprise tooling.

By Ryan Merket · Published Jun 30, 2026, 11:46am CT

Why it matters

Google is cutting generative media into faster, cheaper API tiers so startups can build image and video workflows that would have been too slow or expensive to ship.

Google ships Nano Banana 2 Lite and Gemini Omni Flash to developers — The new Gemini media models put faster image generation and conversational video editing into AI Studio, the Gemini API and enterprise tooling.

Google released Nano Banana 2 Lite and Gemini Omni Flash for developers on June 30, putting two new generative media models into Google AI Studio, the Gemini API and the Gemini Enterprise Agent Platform.

The launch was flagged by Logan Kilpatrick (@OfficialLoganK) in a thread on X, where he described Nano Banana 2 Lite as a sub-4-second image model priced at $0.034 for a 1K-resolution image and Gemini Omni Flash as a video editing model priced at $0.10 per second of output. Google's own blog post confirms the same pricing and availability, and frames the release around a familiar constraint for generative media startups: latency and inference cost now determine what can actually be built.

The announcement came from Google DeepMind product managers Alisa Fortin and Anish Nangia, who positioned Nano Banana 2 Lite as the lowest-cost, fastest member of the Nano Banana image family. Its model name is gemini-3.1-flash-lite-image, and Google is recommending it as the replacement for developers using the original Nano Banana model, gemini-2.5-flash-image.

That replacement framing matters. The first wave of consumer image-generation products proved demand for prompt-driven visuals, but many developer workflows still break down when generation takes too long or when each iteration is too costly. Google is pushing Nano Banana 2 Lite at the part of the market where users will generate and discard many images before selecting one: prototyping, visual drafting, feed-scale creative tooling and automated image pipelines.

https://x.com/GoogleDeepMind/status/2071988047445438475?s=20

Google says Nano Banana 2 Lite produces text-to-image outputs in 4 seconds and is built for high-throughput workloads where speed and cost are the main constraints. The company also says the model keeps prompt adherence, character consistency and legible in-image text rendering despite the cheaper inference profile. Those are company-asserted performance claims; the price, model name and platform availability are directly stated in Google's developer materials.

The product lineup is now more segmented. Nano Banana 2 Lite is the speed-and-cost tier. Nano Banana 2, formally Gemini 3.1 Flash Image, is Google's generalist image model. Nano Banana Pro, formally Gemini 3 Pro Image, remains the higher-control tier for complex work where accuracy and reasoning matter more than latency. The legacy Nano Banana model remains in the family, but Google's docs now tell developers to move to Nano Banana 2 Lite for better quality, faster generation and lower API pricing.

Nano Banana 2 Lite is not limited to developer tooling. Google says it is also rolling into consumer surfaces including AI Mode in Search, the Gemini app, NotebookLM, Google Photos, Stitch, Google Flow and Google Ads. That breadth is the strategic tell: Google is not treating image generation as a standalone app category. It is making it a layer inside products that already have distribution, from search sessions to ad creation.

https://x.com/GoogleDeepMind/status/2071988050012303710?s=20

Gemini Omni Flash is the video half of the release. The model, gemini-omni-flash-preview, is now in public preview for developers through Google AI Studio and the Gemini API. Google says it supports video generation and conversational editing from combinations of text, image and video inputs, and prices output at $0.10 per second, the same listed rate as Veo 3.1 Fast.

Omni Flash is designed less like a one-shot text-to-video model and more like an editing surface. Google describes its core use cases as natural-language video edits, multimodal references, use of Gemini's real-world knowledge, and synchronization between text, graphics and action. In practical terms, that means a developer can build a workflow where a user supplies a still image, asks for changes in plain English, and receives a short edited video rather than learning timeline tools.

The limits are still material. Google says Omni currently offers 10-second video generations. Uploading audio references and scene extension are not supported in the Gemini API for this model. Google also says video references up to 3 seconds are accepted by the API schema but are not correctly processed by the model at this time, and that character consistency can still degrade when scenes change or camera movement is introduced.

Those caveats are important because they mark the boundary between a demo and a production creative workflow. A 10-second clip can power ads, social media assets, storyboards and product previews. It does not yet replace longer-form editing systems. Google is instead giving developers a cheaper, API-accessible way to turn still images and prompts into short motion assets, then iterate through conversational instructions.

Google is also pushing the two models as a combined stack. Its examples include generating an image with Nano Banana 2 Lite and then passing that image into Gemini Omni Flash to animate it. The company says developers can use the Interactions API to preserve session history and context for multi-turn media experiences, including up to three sequential edits.

The demo set is pointed at commercial use cases rather than art experiments. Google's blog describes Anywhere, a selfie-to-landmark demo; Space Lift, an interior design demo; and Omni product studio, which turns static product imagery into short e-commerce videos. That is the market Google is chasing with this release: not just creators making clips, but software companies building vertical workflows where generated images become inputs for generated video.

Google says both Nano Banana 2 Lite and Gemini Omni use SynthID watermarking, and that AI content can be verified through the Gemini app, Gemini in Chrome or Search. That transparency layer is becoming table stakes for models that can generate realistic images and video at API scale.

Kilpatrick's thread put the sales pitch bluntly: speed will unlock use cases where latency sensitivity has blocked adoption. The harder test starts after launch. If Nano Banana 2 Lite's price and response time hold under real developer workloads, the model could make image generation cheap enough to disappear into ordinary product flows. If Omni Flash can make video editing feel conversational without losing control across shots, Google's Gemini API becomes more than another model endpoint. It becomes infrastructure for the next crop of media-native apps.

Why it matters

Reader comments