December 4, 2010

Will Google’s Synthetic Voice Quality Improve with Phonetic Arts

Google recently acquired Phonetic Arts. Will the quality of Google’s Synthetic Voice improve with Phonetic Arts?

Google has been investing heavily in Voice research and rightly so. Voice is the most natural interface for human interactions. However, Voice is also an extremely difficult technology and is far from being perfected by anyone whether it is Microsoft Speech, AT&T Natural Voices or Nuance.

Both input (Voice Recognition) and output (Text to Speech) are difficult, with the input being more difficult of the two. As also seen with GOOG-411, voice recognition has always been a good research project but a bad business.

With the Phonetic Arts acquisition, Google plans to improve its output (Synthetic Voice). An excellent use of synthetic voice is shown in the video below – Two people ordering Indian Food in Hindi using Google Translate and Google’s synthetic voice.

It is awesome to watch people who do not speak the language carry on a conversation using that language. Imagine traveling all over the world and being able to converse with people without needing to speak their language. And this is just one of the many use cases of synthetic voice.  However, the synthetic voice in the video does not sound natural. This is what Google can improve with Phonetic Arts.

Phonetic Arts says it can create a natural-sounding synthetic voice for any person using just a few samples of their voice. Each person can then have their own synthetic voice usable in different situations. However, the Dirk, Riawenna, Jasper sample voices on Phonetic Arts website, do not sound natural. They are not smooth, and have the clipped quality of a text to speech voice. Google will need to continue improving this. There is a big prize waiting at the end of this rainbow. When perfected, Voice interface will have an enormous impact across industries worldwide and generate huge amounts of revenue.