OpenAI has once again stunned us in the realm of speech with its latest creation: Voice Engine.
By inputting text and a 15-second audio sample, it can generate speech that sounds remarkably natural and close to the original voice.
What’s remarkable is that even with a small model and just a 15-second sample, it can produce emotionally rich and lifelike sounds.
Dubbed Voice Engine, this speech engine was first developed towards the end of 2022, and it made its debut in a preview version on March 29th.
Broad Range of Applications
This technology has a wide range of applications—it can help translate videos and podcasts, synthesize similar voices for those with language barriers to maintain consistency across different languages, and assist individuals with speech disorders.
Surprisingly, Voice Engine doesn’t require user data for training or fine-tuning; it operates through a diffusion process combined with transformers to produce speech.
According to Jeff Harris, a product manager at OpenAI, they use a small amount of audio samples and text to generate authentic speech that matches the original speaker. Once the request is fulfilled, the audio used is deleted.
In the field of speech generation, many familiar companies exist, from ElevenLabs to Replica Studios to Papercup. Large tech companies like Amazon, Google, and Microsoft have also long been involved. Harris claims that OpenAI’s approach yields better speech quality.
Cost Considerations
While pricing details for Voice Engine were omitted from today’s marketing materials, according to TechCrunch, it’s priced at $15 per million characters, which covers Dickens’ “Great Expectations” with some room to spare—equivalent to 18 hours of audio, making it slightly cheaper than $1 per hour.
This pricing is indeed somewhat cheaper than ElevenLabs, which charges $11 per month for 100,000 characters. However, Voice Engine currently does not support adjustments to tone, pitch, or rhythm.
Impact on the Voice Acting Industry
If OpenAI’s audio tool gains popularity, what will happen to voice actors? It’s worth noting that voice actor salaries on ZipRecruiter range from $12 to $79 per hour—much more expensive than Voice Engine, even for entry-level voice actors.
Addressing Safety and Privacy Concerns
Lastly, let’s address the concerns about safety and privacy. OpenAI has taken these issues seriously from the outset, explicitly prohibiting unauthorized impersonation of individuals or organizations.
At the same time, they have developed a range of security measures, including watermarking audio generated by Voice Engine and actively monitoring its usage.