How to Produce a Podcast Episode Without a Studio
Build a 4-node flow that takes a topic and returns a ready-to-upload podcast episode. Gemini writes the script, ElevenLabs synthesizes the audio.
I typed a topic into a text field, clicked Run, and had a podcast episode ready to upload 90 seconds later. No microphone. No studio booking. No post-production editing. The whole thing ran inside a 4-node PlugNode flow.
This tutorial covers every node, every wire, and every config decision. By the end you'll have a working flow that takes a topic and returns a downloadable audio file you can ship to any podcast host.
What you'll build
A 4-node flow that chains two AI models:
- Gemini writes a conversational podcast script with speaker tags
- ElevenLabs synthesizes the script into spoken audio
The output: an MP3 file ready for Transistor, Buzzsprout, Spotify for Podcasters, or any RSS-based host.
| Node | Type | Model / Provider | Purpose |
|---|---|---|---|
| manual-trigger | Trigger | None | Starts the flow |
| text-input | Input | None | Episode topic + instructions |
| text | Generation | Gemini 2.5 Flash | Writes podcast script |
| audio | Generation | ElevenLabs | Synthesizes speech |
| output | Output | None | Collects the finished episode |
When "no studio" makes sense
Not every podcast should be AI-generated. Here's where this flow fits:
- Internal comms. Weekly company updates nobody has time to record. Drop in the bullet points, generate the episode, distribute on Slack or your intranet.
- Rapid news recaps. Daily or semi-daily roundups where speed beats polish. Think industry briefings, market summaries, changelog recaps.
- Content repurposing. You wrote a blog post or report. Turn it into a listenable version for people who prefer audio.
- Prototyping. Test an episode concept before investing in a real recording session. Hear how a topic sounds before committing a host's time.
If your show depends on personality, banter, or live interviews, keep the humans. This flow handles the episodes where velocity matters more than vocal chemistry.
Prerequisites
- A PlugNode account (free tier works)
- API keys added in Settings: Gemini, ElevenLabs
- A topic in mind
Open a blank canvas from your dashboard. Everything below happens on that canvas.
Step 1: Add the trigger and topic input
Drag a Manual Trigger node onto the canvas. This fires the flow when you click Run.
Add a Text Input node. Label it "Episode Topic." This holds your topic, target length, and any style notes. Example value:
Topic: The rise of vertical SaaS in 2026.
Length: 5 minutes.
Style: Conversational, two speakers (Host and Guest).
Audience: SaaS founders and operators.Wire the Manual Trigger to the Text Input. Wire the Text Input to the next node (the Text node).
Step 2: Generate the podcast script with Gemini
Add a Text node. Open its config panel and select Gemini 2.5 Flash as the model.
Set the system prompt:
You are a podcast scriptwriter. Given a topic and style notes,
write a complete episode script. Format rules:
- Use speaker tags: [HOST] and [GUEST]
- Write in a conversational, natural tone
- Include an intro, 3-4 discussion segments, and a closing
- Add natural transitions between segments
- Keep sentences short and speakable
- Avoid jargon unless the audience expects it
- Target the requested episode length (assume 150 words per minute of audio)
Output the script only. No stage directions, no music cues.Wire the Text Input's output to the Text node's prompt port. Gemini reads your topic and returns a tagged script.
I tested this with the vertical SaaS topic. The script came back in 2.1 seconds. It opened with a natural hook, moved through three segments, and closed with a summary. The speaker tags were consistent throughout.
One thing I noticed: Gemini occasionally writes overly formal transitions. If that happens, add "Use casual transitions, not broadcast-style" to your system prompt. That fixed it for me.
Step 3: Synthesize audio with ElevenLabs
Add an Audio node. Select ElevenLabs as the provider.
Wire the Text node's output (your full script) to the Audio node's text port. Pick a voice from the dropdown. I used "Drew" for a neutral male voice, but the stock catalog includes Rachel, Clyde, Paul, Domi, Dave, Fin, Sarah, Antoni, and Thomas.
The Audio node reads the entire script and synthesizes it as one continuous MP3. Speaker tags like [HOST] and [GUEST] get spoken as text unless you strip them. For a single-voice episode, this works fine since the voice reads everything naturally.
For multi-voice episodes, you have two options:
- Single voice (simpler). Remove speaker tags from the script by adding an instruction to Gemini: "Write the script as a monologue. No speaker tags." One Audio node handles everything.
- Multiple voices (more work). Chain additional Audio nodes, one per speaker. Use a second Text node to split the script by speaker tag, then wire each speaker's lines to a separate Audio node with a different voice. You'll get separate MP3 files per speaker and need to interleave them externally.
For this tutorial, I went with option 1. The single-voice approach produces a clean, listenable episode in one pass.
Synthesis time depends on script length. A 5-minute episode (roughly 750 words) took about 8 seconds.
Step 4: Collect the output
Add an Output node. Wire two connections into it:
- Audio node output → Output node (audio port)
- Text node output → Output node (text port)
The text output gives you the full script, which doubles as show notes. The audio output is your episode file.
Step 5: Run the flow
Click Run in the toolbar. The canvas executes nodes in order:
- Trigger fires
- Text Input resolves (your topic)
- Gemini writes the script (~2s)
- ElevenLabs synthesizes audio (~8s for a 5-minute episode)
- Output collects everything
Total time on my test run: 11 seconds. Open the Execution Log in the bottom panel to see per-node timing and token counts.
Download the MP3 from the Output node's execution panel. Play it back before you publish. Listen for pronunciation issues, unnatural pauses, or pacing problems.
Exporting to podcast hosts
The MP3 file from PlugNode is ready to upload to any podcast host. Here's the quick path for the most common ones:
Transistor. Log in, click New Episode, upload the MP3, paste the script as show notes, set your publish date. Transistor handles RSS distribution to Apple Podcasts, Spotify, and Google Podcasts.
Buzzsprout. Same flow. Upload, add episode details, publish. Buzzsprout also offers automatic transcription, but you already have the script from the Text node output.
Spotify for Podcasters (formerly Anchor). Upload the MP3, add a title and description from your script. One-click publish to Spotify.
RSS-based hosts. Any host that accepts MP3 uploads works. The file is standard 44.1kHz MP3. No special encoding required.
For episode artwork, consider adding an Image node to your flow. Wire the topic to a Gemini or Nano Banana image prompt, generate cover art, and download it alongside the audio. That's a separate tutorial, but it takes two extra nodes.
Automating weekly episodes
Once the flow works manually, you can publish it as an API endpoint. Replace the Manual Trigger with an HTTP Trigger node. Add a Respond to Webhook node wired to your Output.
Hit Publish in the top bar. PlugNode generates a signed URL:
POST https://plugnode.ai/api/trigger/{secret}/{nodeId}Send a POST with a JSON body containing your topic:
{
"topic": "This week in AI: April 28, 2026",
"length": "8 minutes",
"style": "Solo host, news roundup format"
}Wire this into a cron job or your CMS and you have a recurring podcast pipeline. New topic goes in, finished episode comes out.
Honest limits: where human hosts still win
I've run about 20 episodes through this flow. Here's where it falls short:
Emotional range. ElevenLabs voices are good, but they don't laugh, pause for effect, or react to surprising information. Conversational podcasts that depend on spontaneity lose something.
Interviews. You can script a fake interview, but listeners can tell. If your format relies on back-and-forth dialogue with a real guest, record it.
Brand voice. A synthesized voice doesn't build parasocial relationships the way a human host does. For shows where the host IS the brand, keep the human.
Pronunciation of niche terms. ElevenLabs handles common words well. Obscure product names, technical acronyms, or non-English words sometimes get mangled. You can add phonetic hints in the script to work around this.
Episode length. The ElevenLabs Audio node has character limits based on your plan tier. Long episodes (30+ minutes) may need to be split into chunks. Check your ElevenLabs dashboard for your current limit.
This flow is best for utility content: updates, summaries, recaps, how-tos. It is not a replacement for a talented host with a real microphone.
Troubleshooting
Gemini produces a script with stage directions. Add "No stage directions, no music cues, no [PAUSE] markers" to the system prompt. Gemini sometimes defaults to a screenplay format.
ElevenLabs returns "quota exceeded." Check your ElevenLabs dashboard for character limits on your plan tier. The Audio node passes through provider errors directly.
Audio sounds robotic or rushed. Try a different voice. Some ElevenLabs voices handle long-form content better than others. "Drew" and "Rachel" consistently performed well in my tests.
Script is too long for one synthesis call. Split the script with a second Text node that chunks it into segments under your character limit. Run each chunk through a separate Audio node.
FAQ
AI Podcast Production: Common Questions
How much does one episode cost?+
You pay each provider at their standard rates. A typical 5-minute episode: Gemini Flash (~$0.002 for the script), ElevenLabs (~$0.03-0.08 for 750 words of audio). Total: roughly $0.03-0.08 per episode. No PlugNode markup.
Can I use my own cloned voice?+
Yes, if your ElevenLabs plan supports voice cloning. Clone your voice in ElevenLabs, then select it from the Audio node's voice dropdown. The flow works the same way.
Can I add background music?+
Add a Music node to generate a background track, or use a Sound Effects node for an intro jingle. You'll need to mix the tracks externally (in Audacity, GarageBand, or similar) since PlugNode outputs separate audio files.
Can I swap Gemini for OpenAI?+
Yes. The Text node supports both. Open config, switch the model to GPT-4.1 or o3. The rest of the flow stays wired. I found Gemini slightly better at conversational scripts, but results vary by topic.
Does the flow handle multi-episode series?+
Not in one run. Each run produces one episode. For a series, call the HTTP Trigger endpoint once per episode with a different topic. You can batch these calls from a script or spreadsheet.
What audio format does the output use?+
MP3 at 44.1kHz, which is the standard for podcast hosting platforms. No conversion needed before uploading.