Cloud

Google Cloud gets text-to-speech powers to work with IoT devices, voice systems

Google's Cloud Text-to-Speech uses DeepMind WaveNet technology to allow developers to convert text into natural sounding speech.

Building a slide deck, pitch, or presentation? Here are the big takeaways:
  • Google's Cloud Text-to-Speech offering will allow developers to power voice response systems for call centers, enable IoT devices to talk back to users, and convert text-based media into a spoken format.
  • Google's Cloud Text-to-Speech allows users to choose from 32 different voices from 12 languages and variants.

On Tuesday, Google unveiled its Cloud Text-to-Speech service, allowing developers to convert text into natural sounding speech in a variety of products.

Cloud Text-to-Speech has a number of uses, including powering voice response systems for call centers and enabling real-time natural language conversations, according to a Google blog post. It can also be used to enable Internet of Things (IoT) devices, including TVs, cars, and robots, to talk back to users. Finally, Cloud Text-to-Speech can convert text-based media such as news articles or books into a spoken format, such as a podcast or audiobook.

The service allows users to choose from 32 different voices from 12 languages and variants, according to the post. It can correctly pronounce complex text such as names, dates, times, and addresses, and allows users to customize the pitch, speaking rate, and volume gain, the post noted. Cloud Text-to-Speech also supports a number of audio formats, including MP3 and WAV.

SEE: Quick glossary: Hybrid cloud (Tech Pro Research)

Cloud Text-to-Speech also includes a selection of high-fidelity voices built with WaveNet—a generative model for raw audio created by Google subsidiary DeepMind, the post noted. The original version of WaveNet, published more than a year ago, created raw audio waveforms from scratch using a neural network trained on speech samples.

Google is now using an updated version of WaveNet that runs on Google's Cloud TPU infrastructure, the post noted. This updated model can generate raw waveforms 1,000x faster than the original, and can generate one second of speech in 50 milliseconds. It can also create higher-fidelity waveforms with improved resolution for a more natural human sound, the post said.

In tests, people gave the updated US English WaveNet voices an average score of 4.1 out of 5—more than 20% better than for standard voices, reducing the gap with human speech by over 50%.

For those interested in learning more, Google has also provided product documentation and pricing. To get started with the public beta or try out the new voices, you can visit the Cloud Text-to-Speech website.

Also see

istock-861092546.jpg
Image: iStockphoto/chombosan

About Alison DeNisco Rayome

Alison DeNisco Rayome is a Senior Editor for TechRepublic. She covers CXO, cybersecurity, and the convergence of tech and the workplace.

Editor's Picks

Free Newsletters, In your Inbox