How to Generate a Narrated Explainer Video From a Single Prompt
Turn one topic prompt into a narrated explainer video with AI-generated script, visuals, animation, and voiceover. A single flow handles all four steps.
Type a topic. Get back a narrated explainer video with a written script, generated visuals, animated footage, and a professional voiceover. I built this flow on PlugNode's canvas in about fifteen minutes, and each run finishes in under two minutes.
This tutorial covers every node, every wire, and every config decision. By the end you'll have a working flow that turns a single prompt into downloadable video and audio assets.
What you'll build
A 6-node flow that chains four AI models:
- Gemini writes a structured script with scene descriptions from your topic
- Nano Banana generates a hero visual based on the script
- Veo animates that visual into a short clip
- ElevenLabs narrates the full script as a voiceover
The output: a video clip and a narration audio file, ready to composite in your editor of choice.
| Node | Type | Model / Provider | Purpose |
|---|---|---|---|
| manual-trigger | Trigger | None | Starts the flow |
| text-input | Input | None | Topic prompt |
| text | Generation | Gemini 2.5 Flash | Writes script + scene descriptions |
| image | Generation | Nano Banana Pro | Generates hero scene visual |
| video | Generation | Veo | Animates the visual into a clip |
| audio | Generation | ElevenLabs | Narrates the script |
| output | Output | None | Collects final assets |
The four-tool problem
Explainer videos eat time because they require four separate skills: scriptwriting, illustration, animation, and voice acting. AI tools exist for each step. The problem is stitching them together.
Open ChatGPT, copy the script, paste it into Midjourney, download the image, upload it to Runway, wait for the clip, copy the script again, paste it into ElevenLabs, download the audio. That is five browser tabs and a lot of copy-paste. Miss one detail in the script and you start the cycle over.
PlugNode wires all four steps into a single flow. Change the topic, click Run, and every downstream node updates automatically. No copy-paste. No tab switching. No re-uploading.
Prerequisites
- A PlugNode account (free tier works)
- API keys added in Settings: Gemini, ElevenLabs
- A topic you want to explain (one sentence is enough)
Open a blank canvas from your dashboard. Everything below happens on that canvas.
Step 1: Add the trigger and topic input
Drag a Manual Trigger node onto the canvas. This fires the flow when you click Run.
Add a Text Input node. Label it "Topic Prompt." This holds your explainer topic as a short sentence. Example value: "How solar panels convert sunlight into electricity. Audience: high school students. Tone: clear, visual, friendly."
Wire the Manual Trigger to the Text Input node. The trigger tells the flow to start, and the text input feeds your topic into the first generation step.
Step 2: Generate the script with Gemini
Add a Text node. Open its config panel and select Gemini 2.5 Flash as the model.
Set the system prompt:
You are a short-form video scriptwriter. Given a topic, write an explainer
script for a 30-60 second video. Format:
TITLE: (5-10 words)
SCRIPT: (3-5 short paragraphs, clear and direct)
SCENE DESCRIPTION: (1 paragraph describing the key visual scene
for the hero image: setting, objects, colors, mood)
Write for spoken narration. Short sentences. No jargon.Wire the Text Input's output to the Text node's prompt port. Gemini reads your topic and returns a structured script plus a scene description the Image node can use.
I tested this with the solar panel topic. The response came back in 1.4 seconds:
TITLE: How Solar Panels Work
SCRIPT: Sunlight hits a solar panel and something invisible happens.
Photons knock electrons loose inside silicon cells. Those electrons
flow through a circuit and create electricity. An inverter converts
the current so your home appliances can use it. One rooftop panel
generates enough power to run a refrigerator all day.
SCENE DESCRIPTION: A rooftop solar panel array under bright midday sun.
Blue-black crystalline cells reflect light. A cutaway view shows
electrons moving through a simplified circuit. Warm tones, clean sky,
no clouds. Technical but approachable.Step 3: Generate the hero visual
Add an Image node. Select Nano Banana Pro as the model.
Wire the Text node's output to the Image node's prompt port. Then add a static instruction in the config:
Generate a hero image based on the SCENE DESCRIPTION section above.
Editorial illustration style, clean composition, vibrant colors. 1024x1024.The Image node reads the Gemini output and pulls visual direction from the scene description. This gives you artwork that matches the script tone, not a generic stock image.
Generation takes 8-15 seconds depending on queue depth.
Step 4: Animate the visual with Veo
Add a Video node. Select Veo as the model.
Wire the Image node's output to the Video node's image port. In the prompt field:
Slow pan across the scene. Subtle particle effects suggesting energy flow.
Warm lighting. Duration: 6 seconds. No text overlays.Veo takes the generated image and produces an animated clip. Six seconds is enough for an explainer intro or a social teaser. Keep the prompt short here. Veo produces cleaner results with minimal direction.
Video generation runs 30-60 seconds. The node shows a progress indicator on the canvas while it works.
Step 5: Narrate the script with ElevenLabs
Add an Audio node. Select ElevenLabs as the provider.
Wire the Text node's output (your full script) to the Audio node's text port. Pick a voice from the dropdown. I used "Drew" for this test because it fits a calm explainer tone, but any voice works.
The Audio node synthesizes the script as spoken narration. Output is an MP3 file. For a 30-second script, expect roughly 25-35 seconds of audio depending on the voice's pacing.
Synthesis takes 2-4 seconds.
Step 6: Collect and download
Add an Output node. Wire three connections into it:
- Video node output → Output node (video port)
- Audio node output → Output node (audio port)
- Text node output → Output node (text port)
The Output node collects all three assets. After a run completes, download each file from the execution panel, or inspect the script text inline.
Running the flow
Click Run in the toolbar. The canvas executes nodes in dependency order:
- Trigger fires
- Text Input resolves (your topic)
- Gemini writes the script (~1.4s)
- Nano Banana generates the hero image (~12s)
- Veo produces the animated clip (~45s)
- ElevenLabs synthesizes the voiceover (~3s)
- Output collects everything
Total wall-clock time on my test run: 63 seconds. Veo is the bottleneck. Everything else is fast.
Open the Execution Log in the bottom panel to see per-node timing, token counts, and any errors. Each node shows its input/output pair, so you can trace exactly what happened without re-running.
Iterating without starting over
This is where the flow structure pays off. Say the script is good but the visual misses the mark. Open the Image node config, tweak the style instruction, and re-run. Only the Image, Video, and downstream nodes execute again. The script stays the same.
Want a different voice? Swap the ElevenLabs voice in the Audio node dropdown. Re-run. The script stays, the image stays, the video stays. Only the audio regenerates.
Need a completely different angle? Change the topic in the Text Input node. Click Run. Every node picks up the new input and produces fresh outputs end to end.
This is the difference between a flow and a stack of browser tabs. Each piece is independent but connected. Fix one step without touching the rest.
Publishing as an API
Once the flow works manually, you can automate it. Replace the Manual Trigger with an HTTP Trigger node. Add a Respond to Webhook node wired to your Output.
Hit Publish in the top bar. PlugNode generates a signed URL:
POST https://plugnode.ai/api/trigger/{secret}/{nodeId}Send a POST with a JSON body containing your topic. The endpoint returns a 202 (async) or the full response if you append ?wait=true.
This lets your LMS, CMS, or internal tool trigger explainer video creation programmatically. New course module uploaded? Fire the webhook. New product feature shipped? Auto-generate the explainer.
Troubleshooting
Veo returns a timeout. Video generation occasionally exceeds 60 seconds under load. Retry the run. If it fails consistently, shorten the prompt or reduce the requested duration.
ElevenLabs returns "quota exceeded." Check your ElevenLabs dashboard for character limits on your plan tier. The audio node passes through provider errors directly.
Image doesn't match the script. The scene description from Gemini might be too abstract. Add grounding detail in the Image node config: "Illustration of [specific subject], clean background, editorial style."
Script is too long for a 30-second video. Adjust the Gemini system prompt. Change "30-60 second video" to "20-30 second video" and add "maximum 3 paragraphs."
FAQ
AI Narrated Explainer Videos: Common Questions
How much does one run cost?+
You pay each provider at their standard rates. A typical run: Gemini Flash (~$0.001), Nano Banana ($0.01-0.03), Veo ($0.05-0.10 for a 6-second clip), ElevenLabs (~$0.003 for 200 characters). Total: roughly $0.07-0.14 per explainer. No PlugNode markup.
Can I use this for multi-scene videos?+
Not yet in a single flow run. Each run produces one hero scene clip. For multi-scene videos, create parallel flows (one per scene) and composite the clips in your editor. Multi-scene assembly is on the PlugNode roadmap.
Can I swap Gemini for OpenAI?+
Yes. The Text node supports both. Open the config, switch the model to GPT-4.1 or o3. The rest of the flow stays wired.
Does the flow composite the video and audio together?+
Not automatically. The flow outputs them as separate files. Composite them in your editor, CapCut, or with an ffmpeg script. A future node may handle this, but today you get raw assets.
What voices are available in ElevenLabs?+
The Audio node offers the full ElevenLabs stock catalog: Rachel, Drew, Clyde, Paul, Domi, Dave, Fin, Sarah, Antoni, Thomas, and others. Custom cloned voices are supported if your ElevenLabs plan includes them.
Can I use this for course content?+
Yes. Course creators are one of the primary audiences for this flow. Set the Gemini system prompt to match your course tone and reading level. Run once per lesson topic. Download the assets and drop them into your course platform.
The full use case page is at /use-cases/narrated-explainer-videos. For more on publishing flows as APIs, see How to Publish an AI Flow as a Production API.