Skip to content
Google · Video generation

Veo 3

Google Veo 3 on a visual canvas. 1080p video with native audio, wired into a flow you can publish as a signed endpoint.

At a glance
  • Veo 3 is Google's text-to-video and image-to-video model with native audio, available through the Gemini API.
  • Supports 720p and 1080p at 4, 6, or 8 seconds. 1080p locks to 8s.
  • PlugNode runs Veo 3 as one option on the Video node. BYO Gemini key, no credit markup.
  • Chain with Gemini Text for scripts and ElevenLabs for branded voiceover, then publish the pipeline behind one URL.

What it is

Veo 3 is Google's second-generation video model and the one that introduced native audio generation to the Veo family. It takes a text prompt, an optional reference image, and produces an MP4 with synchronised ambient audio, dialogue, and music cues at up to 1080p. Compared to Veo 2 it adds higher resolution and audio. Compared to Veo 3.1 it caps at 1080p and does not support video-extension input. On PlugNode, Veo 3 is one of four models on the Video node. Pick it from the dropdown, set prompt and reference image, and the canvas handles the rest. Your API calls route to Google directly with your Gemini key.

What you can do with it

  • Text-to-video with synchronised audio
  • Image-to-video from a single reference still
  • 1080p 16:9 or 720p in both 16:9 and 9:16
  • Generate dialogue, ambient sound, and music in one pass
  • Pair with the File Input node to animate uploaded product shots
  • Pair with ElevenLabs Audio for a branded voice layer on top of native audio
  • Publish behind an HTTP Trigger and rotate the secret anytime

When to pick Veo 3 over Veo 3.1

Veo 3 and Veo 3.1 overlap in most production scenarios. The deciding factor is usually cost and the 4K / video-extension ceiling.

Veo 3 tops out at 1080p and does not accept a video URL as an input. If you need to extend an existing clip or render at 4K, use Veo 3.1 instead. For standard ad-unit work (1080p 16:9, 720p 9:16 for vertical, native audio, single-shot compositions), Veo 3 is the balance between quality and cost.

On PlugNode you can flip between them by changing the dropdown on the Video node without touching downstream connections. A useful pattern: iterate with Veo 3 at 720p while tuning the prompt, then swap to Veo 3.1 at 4K for the final render.

How native audio works in Veo 3

When you submit a prompt to Veo 3 that describes speakers, ambient sound, or music, the model renders those as part of the same MP4 that carries the video. It is not a TTS pass or a separate audio endpoint. The audio is baked in and frame-aligned.

For prompts that do not specify sound, the model still produces ambient audio matched to the visual scene. If you need a specific voice, a longer script, or multi-voice dialogue, add an ElevenLabs Audio node after the Video node and mix it over or replace the native track. The ElevenLabs integration page covers the voice options.

The Respond to Webhook node can return both streams in one payload so your caller has flexibility downstream.

Image-to-video in one node

The File Input node on PlugNode accepts an uploaded image or a URL, and the Video node reads it as the reference. Passing a reference image forces the duration to 8 seconds, which is the Veo family's constraint rather than a PlugNode decision.

Typical use cases: animating a product photo into a short showcase clip, adding motion to a still illustration for a social post, or generating a b-roll moment from a single frame pulled from a video capture.

Because the File Input is part of the flow, you can also publish the whole thing as an API that accepts an image upload in the POST body. The product-video-ads use case walks through that pattern end to end.

Publishing a Veo 3 pipeline as an endpoint

A typical Veo 3 flow you would ship: HTTP Trigger receives a prompt in JSON, File Input reads a product image sent in the same payload, Gemini Text polishes the prompt or generates an accompanying script, Video (Veo 3, 1080p) renders the clip, and Respond to Webhook returns the MP4 URL.

The whole thing becomes a POST https://plugnode.ai/api/trigger/{secret}/{nodeId} endpoint the moment you press Publish. Rate limit defaults to 60 requests per minute per trigger, secrets rotate from the flow settings, and every publish creates a versioned snapshot. Roll back in one click if the new version breaks a caller's contract.

That is the thing you cannot get by calling the Gemini API directly. See the publish-as-API guide for the full pattern and the Gemini Video integration for the node-level reference.

Honest limits of Veo 3

Veo 3 does not support video extension (feeding an existing clip as input). It caps at 1080p. It does not generate speech that matches a specific brand voice. For that, layer ElevenLabs.

It does not maintain long-form narrative continuity across multiple generated clips. For multi-shot sequences you run several generations and stitch, which is exactly what a visual canvas is good at.

Content policy and safety filters are Google's and apply as they would via the Gemini API. Flows that hit them return an error through the execution log, not a silent fallback.

Run it on PlugNode

Select "Veo 3" on the Video node dropdown, supply a Gemini API key, and the flow runs the model on every execution. Browser preview and server-side trigger runs use the same code path, so local iteration matches production output.

Veo 3 generations are billed by Google at published Gemini API rates. Your PlugNode account is not charged for model compute.

Frequently asked questions

What is Veo 3?
Veo 3 is Google's text-to-video and image-to-video model available through the Gemini API. It generates up to 1080p clips with synchronised audio at 4, 6, or 8 second durations.
Does Veo 3 generate audio on its own?
Yes. Veo 3 produces synchronised dialogue, ambient sound, and music cues inside the same MP4 it renders. No separate TTS call is needed for default audio.
Can I use Veo 3 with an image as input?
Yes. PlugNode's File Input node feeds a reference image into the Video node. The model then animates from that still. Clip length is 8 seconds whenever a reference image is present.
How does Veo 3 differ from Veo 3.1?
Veo 3.1 adds 4K output and video-extension input. Veo 3 caps at 1080p and does not accept a video URL as an input. They share API shape, prompt format, and audio behaviour.
Can I turn a Veo 3 flow into an API endpoint?
Yes. Wire an HTTP Trigger into the Video node, end with Respond to Webhook, hit Publish. PlugNode gives you a signed endpoint with rate limiting and version control.
Do I need a credit card for Veo 3 on PlugNode?
You need a Gemini API key billed by Google. That is where the model compute goes. PlugNode itself runs on bring-your-own-keys pricing.

Last updated 2026-04-25

Generate your first video ad in 3 minutes.

Free to start. No credit card. Upload a product photo, connect your AI models, click Run.