I’ve just published my first podcast episode (Episode 128) using Amazon Polly—Amazon’s AI-based text-to-speech engine.
Listen to the episode here.
It’s quite good—much better than the few others that I’ve tried.
What I quite like about it for my podcast is that I can use SSML syntax to insert pauses at appropriate points. For example, I like a one second pause between items, and then two seconds buffering section titles.
Here are the basic steps.
Complete your text.
Log into Amazon Polly’s console.
Paste piece of text into the window. Start with a small piece, for numerous reasons.
Wrap the text with a speak tag, like you see in the example above.
Insert any additional syntax, such as delays, optional different voices, speed differences, etc.
Close out with another speak tag (except the close version).
Download the voice as MP3.
You can now upload that clip as audio for your podcast or other type of production.
You will have to keep each clip below 3000 characters, which is slightly annoying if you have lots of content in your shows. It just means you’ll have to break it into multiple clips, which was pretty easy to re-assemble using Adobe Audition.