Tutorial2026-05-01 · 8 min read

How to Create AI Voiceovers From a Script in One Flow

Paste a rough script, get back a studio-quality voiceover in under 30 seconds. This flow uses Gemini for script cleanup and ElevenLabs for synthesis.

PlugNode Team

Paste a rough script. Get back a clean, natural-sounding voiceover in under 30 seconds. I built this flow on PlugNode's canvas in about five minutes, and it costs less than $0.01 per run.

This tutorial covers every node, every wire, and every config field. You'll walk out with a working flow you can trigger manually or publish as an API endpoint for your app, store, or podcast pipeline.

What you'll build

A 5-node flow that chains two AI models:

Gemini rewrites your rough script for natural speech (expanding abbreviations, removing URLs, adding pause cues)
ElevenLabs generates a studio-quality voiceover from the cleaned script

The output: a downloadable MP3 file ready for your video editor, podcast host, or ad platform.

Node	Type	Model / Provider	Purpose
manual-trigger	Trigger	None	Starts the flow
text-input	Input	None	Your raw script
text	Generation	Gemini 2.5 Flash	Cleans and rewrites for speech
audio	Generation	ElevenLabs	Synthesizes the voiceover
output	Output	None	Collects the audio file

Prerequisites

A PlugNode account (free tier works)
API keys added in Settings: Gemini, ElevenLabs
A script or rough notes (even bullet points work)

Open a blank canvas from your dashboard. Everything below happens on that canvas.

Step 1: Add the trigger and input

Drag a Manual Trigger node onto the canvas. This fires the flow when you click Run.

Add a Text Input node. Label it "Raw Script." Paste your rough script here. It can be messy. Bullet points, half-sentences, URLs, abbreviations, all fine. The next node handles cleanup.

Example value:

Intro for ep 47. Topic: why DTC brands should test short-form video ads on TikTok before scaling to Meta. Mention avg CPM diff ($4-6 TikTok vs $10-14 Meta). Keep it under 30 sec. Casual tone, not salesy.

Wire the Manual Trigger to the Text Input. Wire the Text Input to the next generation node.

Step 2: Clean the script with Gemini

Add a Text node. Open its config panel and select Gemini 2.5 Flash as the model.

Set the system prompt:

You are a script editor for spoken audio. Given rough notes or a draft script,
rewrite it as a clean voiceover script optimized for text-to-speech.
 
Rules:
- Expand all abbreviations (DTC → direct-to-consumer, CPM → cost per mille)
- Remove URLs and markdown formatting
- Write numbers as words when under 100, digits when over
- Add [pause] markers between sections for natural breathing
- Keep the original tone and intent
- Do not add intros like "Welcome to..." unless the input asks for one
- Output only the final script, no commentary

Wire the Text Input's output to the Text node's prompt port.

I tested this with the rough notes above. Gemini returned a clean 28-second script in 0.9 seconds:

Here's something most DTC brands get wrong. They scale video ads on Meta
before testing on TikTok. [pause] The numbers tell the story. TikTok's
average cost per mille sits between four and six dollars. Meta? Ten to
fourteen. [pause] Test the creative on TikTok first. If it works there,
scale it on Meta with confidence. You'll spend less finding what converts.

The rough bullet points became a conversational script with natural pacing. That is the point of this node: you skip the manual rewriting step.

Step 3: Generate the voiceover with ElevenLabs

Add an Audio node. Select ElevenLabs as the provider.

Wire the Text node's output to the Audio node's text port. Pick a voice from the dropdown. I used "Drew" for a casual, mid-range male voice. Other good options:

Rachel: warm, professional female voice (good for product demos)
Sarah: clear, neutral female voice (good for explainers)
Antoni: deep male voice (good for brand ads)
Fin: energetic male voice (good for YouTube intros)

The Audio node sends the cleaned script to ElevenLabs and returns an MP3 file. Synthesis takes 2-5 seconds depending on script length.

For the 28-second script above, the output was a 26-second MP3 at 128kbps. Clean pronunciation, natural cadence, correct emphasis on the numbers.

Step 4: Collect the output

Add an Output node. Wire two connections into it:

Audio node output to the Output node (audio port)
Text node output to the Output node (text port)

Wiring the text output alongside the audio gives you the final script for reference, subtitles, or show notes.

Step 5: Run the flow

Click Run in the toolbar. The canvas executes in dependency order:

Trigger fires
Text Input resolves (your raw script)
Gemini rewrites for speech (~1s)
ElevenLabs synthesizes audio (~3s)
Output collects everything

Total wall-clock time on my test run: 4.2 seconds. Open the Execution Log in the bottom panel to see per-node timing, token counts, and any errors.

Download the MP3 from the execution panel. Drop it into your video editor, podcast DAW, or ad platform.

Publishing as an API

Once the flow works manually, you can automate it. Replace the Manual Trigger with an HTTP Trigger node. Add a Respond to Webhook node wired to your Output.

Hit Publish in the top bar. PlugNode generates a signed URL:

POST https://plugnode.ai/api/trigger/{secret}/{nodeId}

Send a JSON body with your raw script:

{
  "script": "Your rough notes or bullet points here..."
}

The endpoint returns the audio file and cleaned script in the response. Append ?wait=true for synchronous delivery.

Use cases for the API version:

E-commerce: trigger from your product catalog webhook. Every new SKU auto-generates a voiceover for its product video.
Podcasts: batch-process episode scripts from your CMS. Push a button, get all voiceovers for the week.
YouTube: wire it into your production pipeline. Paste the script in your project management tool, webhook fires, voiceover appears in your shared drive.

Rate limit: 60 requests per minute per trigger.

Cost comparison: AI voiceover vs. hiring voice talent

Here is what I paid across five test runs of varying script lengths:

Script length	Gemini cost	ElevenLabs cost	Total	Time
15 seconds	$0.0005	$0.003	~$0.004	3s
30 seconds	$0.001	$0.005	~$0.006	4s
60 seconds	$0.001	$0.009	~$0.01	6s
2 minutes	$0.002	$0.018	~$0.02	10s
5 minutes	$0.003	$0.04	~$0.04	18s

Compare that to hiring voice talent:

Method	Cost per minute	Turnaround
Freelance (Fiverr/Upwork)	$25-75	1-3 days
Professional studio	$100-300	3-7 days
PlugNode flow	~$0.01	4-6 seconds

The quality gap has narrowed. ElevenLabs voices sound natural enough for product demos, podcast intros, internal training, and social ads. For flagship brand campaigns where a specific voice actor matters, hire a human. For everything else, this flow handles it.

Troubleshooting

ElevenLabs returns "quota exceeded." Check your ElevenLabs dashboard for character limits on your plan tier. The free tier allows 10,000 characters per month. Upgrade or wait for the monthly reset.

Gemini rewrites too aggressively. Add a constraint to the system prompt: "Preserve the original wording as much as possible. Only fix formatting for speech." This keeps the model from paraphrasing your content.

Audio sounds robotic or rushed. Add more [pause] markers in the Gemini system prompt. You can also try a different ElevenLabs voice. Some voices handle casual scripts better than others.

The flow runs but the output node is empty. Check that the Audio node's output wire connects to the Output node's audio port. A common mistake is wiring to the text port instead.

What's next

This flow handles single-voice narration. For more complex production, consider these extensions:

Add a second Audio node with a different voice to generate A/B variants of the same script
Chain a Music node (Lyria) after the Audio node to generate background music that matches the voiceover tone
Wire the API version into a Zapier zap or Make scenario to trigger voiceovers from Google Sheets rows

The full use case page is at /use-cases/ai-voiceover-generator.

How to Create AI Voiceovers From a Script in One Flow

What you'll build

Prerequisites

Step 1: Add the trigger and input

Step 2: Clean the script with Gemini

Step 3: Generate the voiceover with ElevenLabs

Step 4: Collect the output

Step 5: Run the flow

Publishing as an API

Cost comparison: AI voiceover vs. hiring voice talent

Troubleshooting

What's next

FAQ

Can I use my own ElevenLabs custom voice?

What audio formats does the output support?

Can I generate voiceovers in languages other than English?

How long can the script be?

Does PlugNode store my audio files?

Generate your first video ad in 3 minutes.