Gemini Omni Flash — Google's Newest Video Model

Announced at Google I/O 2026, Gemini Omni Flash is the first model in Google's Omni family — a multimodal generator that takes text, images, audio, and video as input and outputs cinematic video.
Stronger physics, better scene consistency, and conversational editing in a single model.

Released May 2026 · Multimodal input · Conversational refinement · Real-world physics

background

  Overview

What Gemini Omni Flash is, and why it matters

Gemini Omni Flash is the production-tier video model in Google's new Omni family. It supersedes Veo 3.1 as Google's flagship video generator and brings three core advances.

Multimodal input as a first-class feature

Gemini Omni Flash treats text, images, audio, and video as equal input modalities. Earlier video models were primarily text-driven with images bolted on; Omni Flash routes all four through the same architecture, so reference media actually steers the result instead of being a hint.

Real-world physics simulation

Google has measurably improved how the model simulates gravity, kinetic energy, and fluid dynamics. Falling objects fall correctly. Liquids pour correctly. Cloth and hair move with realistic weight. This is the difference between "AI-generated" looking and "actually plausible" looking output.

Conversational editing

Once a clip exists, you can refine it by replying in natural language. The model remembers the scene across turns — characters stay consistent, lighting holds, composition is preserved. You're editing the same scene, not regenerating from a new prompt.

Grounded in Gemini's knowledge

Because Omni Flash inherits Gemini's broader knowledge of history, science, and culture, prompts about real places, eras, and phenomena produce visuals that look researched. Asking for "a Tokyo izakaya in the 1980s" or "a hurricane viewed from low orbit" returns visuals closer to reality.

  Access

Where Gemini Omni Flash lives, and how to access it

Google ships Gemini Omni Flash through several channels. Each has trade-offs.

Available to Google AI Plus ($20/month), Pro ($30/month), and Ultra ($100/month) subscribers in the Gemini app and Google Flow. Best if you already use Google's stack and have a subscription. Quotas and usage limits apply per tier.

Gemini Omni Flash specifications at a glance

What the model can do today, based on Google's launch disclosures.

Up to 10-second clips

Flash-tier clips are capped at 10 seconds at launch. This is a deployment decision by Google, not a model limit — extensions are expected.

Multimodal input

Text + image + audio + video as input modalities, in any combination.

Cinematic resolution output

Output is high-resolution and suitable for ads, social, and professional distribution. Exact specs vary by plan and surface.

SynthID provenance

Every clip carries Google's imperceptible SynthID watermark — verifiable as AI-generated through Google's tools, but not a visible mark on the output.

Conversational refinement

Reply with natural-language edits to refine a clip without losing scene state.

Avatar mode (limited)

Google announced an avatar capability — generating videos that look and sound like the user — but is holding back its broadest rollout pending safety review.

  Getting started

How to use Gemini Omni Flash on Gomni

Three steps from sign-up to first clip.

01

Sign up with email

No Google account juggling. Email sign-up takes 30 seconds and you land with starter credits.

02

Pick text-to-video or image-to-video

Type a prompt, drop an image, or do both. Mix inputs — Gemini Omni Flash treats them all as steering signals.

03

Refine and export

Reply with conversational edits to iterate. Export as MP4 in your target aspect ratio, watermark-free, commercial rights included.

  FAQ

Gemini Omni Flash FAQ

Common questions about Google's Gemini Omni Flash video model.

Try Gemini Omni Flash on Gomni

No Google subscription. No waitlist. Sign up with email and generate your first clip in under a minute.