Skip to content
Tutorial2026-04-29 · 8 min read

Generate AI Product Video Ads From a Single Image

Turn one product photo into a finished video ad with AI-generated copy, styled visuals, animated footage, and voiceover. A 7-node flow that runs in 90 seconds.

PT
PlugNode Team

One product photo goes in. A finished video ad with copy, motion, and voiceover comes out. I built this flow on PlugNode's canvas in about twelve minutes, and it runs in under 90 seconds per execution.

This tutorial walks through every node, every wire, and every config field. By the end you'll have a working flow you can trigger manually or publish as an API endpoint for your store.

What you'll build

A 7-node flow that chains four AI models:

  1. Gemini writes ad copy (headline + CTA) from your product image
  2. Nano Banana generates a styled product visual
  3. Veo produces a short video clip from that visual
  4. ElevenLabs narrates the ad copy as a voiceover

The output: a video file and an audio file, ready to composite in any editor or feed into your ad platform.

NodeTypeModel / ProviderPurpose
manual-triggerTriggerNoneStarts the flow
text-inputInputNoneProduct name + brief
file-inputInputNoneProduct photo (PNG/JPG)
textGenerationGemini 2.5 FlashWrites ad script
imageGenerationNano Banana ProStyled product hero
videoGenerationVeoAnimated product clip
audioGenerationElevenLabsVoiceover narration
outputOutputNoneCollects final assets

Prerequisites

  • A PlugNode account (free tier works)
  • API keys added in Settings: Gemini, ElevenLabs
  • One product photo (PNG or JPG, under 4MB)

Open a blank canvas from your dashboard. Everything below happens on that canvas.

Step 1: Add the trigger and inputs

Drag a Manual Trigger node onto the canvas. This fires the flow when you click Run.

Next, add two input nodes:

  • Text Input: label it "Product Brief." This holds the product name, target audience, and tone. Example value: "Ceramic travel mug. Audience: commuters. Tone: minimal, confident."
  • File Input: label it "Product Photo." Upload your PNG here.

Wire both inputs into the next generation node (the Text node). Text Input connects to the prompt port. File Input connects to the image port.

Step 2: Generate ad copy with Gemini

Add a Text node. Open its config panel and select Gemini 2.5 Flash as the model.

Set the system prompt:

You are a direct-response copywriter. Given a product image and brief,
write a 15-second ad script. Format:
 
HEADLINE: (5-8 words)
BODY: (2 sentences, benefit-led)
CTA: (1 short phrase)
 
No fluff. No filler adjectives.

Wire the Text Input's output to the Text node's prompt port. Wire the File Input's output to the Text node's image port. Gemini reads both the text brief and the photo, then returns structured copy.

I tested this on a white ceramic mug photo. The response came back in 1.2 seconds:

HEADLINE: Your morning, insulated.
BODY: Double-wall ceramic keeps coffee hot for 4 hours. Fits every cupholder.
CTA: Shop the mug.

Step 3: Generate a styled product image

Add an Image node. Select Nano Banana Pro as the model.

For the prompt, wire the Text node's output directly. Then append a static instruction in the config:

Generate a lifestyle product photo based on the ad copy above.
Clean background, soft lighting, editorial style. 1024x1024.

The Image node takes the Gemini output (your ad copy) as context and generates a fresh product hero shot. This gives you a styled visual that matches the ad tone, not the raw product photo you uploaded.

Generation takes 8-15 seconds depending on queue depth.

Step 4: Create a video clip with Veo

Add a Video node. Select Veo as the model.

Wire the Image node's output to the Video node's image port. In the prompt field:

Slow zoom into the product. Soft ambient lighting shifts from warm to cool.
Duration: 4 seconds. No text overlays.

Veo takes the generated image and produces a short animated clip. Four seconds is enough for a social ad unit. I found that shorter prompts produce cleaner results here. Veo tends to over-interpret long descriptions.

Video generation runs 30-60 seconds. The node shows a progress indicator on the canvas while it works.

Step 5: Add voiceover with ElevenLabs

Add an Audio node. Select ElevenLabs as the provider.

Wire the Text node's output (your ad copy) to the Audio node's text port. Pick a voice from the dropdown. I used "Rachel" for this test, but any voice works.

The Audio node synthesizes the ad script as spoken audio. Output is an MP3 file, typically 8-12 seconds for a 15-second script (people read faster than they speak, so it balances out).

Synthesis takes 2-4 seconds.

Step 6: Collect outputs

Add an Output node. Wire three connections into it:

  1. Video node output → Output node (video port)
  2. Audio node output → Output node (audio port)
  3. Text node output → Output node (text port)

The Output node collects all three assets. After a run completes, you can download each file from the execution panel, or inspect the text output inline.

Step 7: Run the flow

Click Run in the toolbar. The canvas executes nodes in dependency order:

  1. Trigger fires
  2. Inputs resolve (text brief + photo)
  3. Gemini writes copy (~1.2s)
  4. Nano Banana generates the hero image (~12s)
  5. Veo produces the video clip (~45s)
  6. ElevenLabs synthesizes voiceover (~3s)
  7. Output collects everything

Total wall-clock time on my test run: 61 seconds. The slowest node is always Veo. Everything else is fast.

Open the Execution Log in the bottom panel to see per-node timing, token counts, and any errors. Each node shows its input/output pair, so you can debug without re-running the entire flow.

Publishing as an API

Once the flow works manually, you can publish it as an endpoint. Replace the Manual Trigger with an HTTP Trigger node. Add a Respond to Webhook node wired to your Output.

Hit Publish in the top bar. PlugNode generates a signed URL:

POST https://plugnode.ai/api/trigger/{secret}/{nodeId}

Send a multipart POST with your product image and a JSON body containing the brief. The endpoint returns a 202 (async) or the full response if you append ?wait=true.

Rate limit: 60 requests per minute per trigger. Secret rotation available in Settings → Flow.

Every publish creates a versioned snapshot. If you break something, roll back to the previous version in one click.

Troubleshooting

Veo returns a timeout. Video generation occasionally exceeds 60 seconds under load. Retry the run. If it fails consistently, shorten the prompt or reduce implied complexity.

ElevenLabs returns "quota exceeded." Check your ElevenLabs dashboard for character limits on your plan tier. The audio node passes through provider errors directly.

Image node produces a blank or irrelevant output. The prompt wired from the Text node might be too abstract. Add a grounding phrase in the Image node's config: "Product photo of [specific item], white background."

Gemini ignores the image. Confirm the File Input's output wire connects to the Text node's image port (not the prompt port). The image port is separate.

What's next

This flow produces raw assets (video + audio + copy). For a production ad pipeline, consider these extensions:

  • Add an Image Resize node between the Image and Video nodes to force exact platform dimensions (1080x1080 for Instagram, 1920x1080 for YouTube)
  • Chain a second Text node after the first to generate platform-specific copy variants (Meta, TikTok, Google)
  • Wire the HTTP Trigger version into your product catalog webhook so new SKUs auto-generate ads on upload

The full use case page is at /use-cases/product-video-ads. For more on publishing flows as APIs, see How to Publish an AI Flow as a Production API.

FAQ

How much does one run cost?

You pay each provider at their standard rates. A typical run: Gemini Flash ($0.001 for the text call), Nano Banana ($0.01-0.03 for the image), Veo ($0.05-0.10 for a 4-second clip), ElevenLabs ($0.002 for 100 characters). Total: roughly $0.07-0.14 per ad. No PlugNode markup.

Can I swap Gemini for OpenAI?

Yes. The Text node supports both. Open config, switch the model to GPT-4o or GPT-5. The rest of the flow stays wired.

Does the flow composite the video and audio together?

Not automatically. The flow outputs them as separate files. Compositing happens downstream in your editor, ad platform, or a ffmpeg script. A future node may handle this, but today the flow returns raw assets.

Can I batch multiple products in one call?

Not in a single flow run. Each run processes one product image. For batch processing, call the HTTP Trigger endpoint in a loop from your backend, one POST per SKU.

What image formats does the File Input accept?

PNG and JPG. Max file size is governed by your workspace storage quota (default 5GB total). Individual files should stay under 4MB for best results with vision models.

Generate your first video ad in 3 minutes.

Free to start. No credit card. Upload a product photo, connect your AI models, click Run.