Expression is the fundamental essence of any language, and we humans understand the meanings or ideas conveyed by a vocalized word or group of words in any given sentence or expression not only through the autonomous meanings of each word dissected apart from each other as they come through in their order in such a sentence or expression, but also as we associate how they were actually spoken. Improper intonation, in fact, can alter meanings and understandings in a totally significant way.
It is on this line that AT&T is working towards developing the StorEbook, which is a text-to-speech app that supposedly will read children’s stories in character-appropriate voices. According to AT&T, our interaction with devices through speech has become common and indeed useful; however, the way that devices respond and talk to us is devoid of any expression and thus sounds unnatural.
The app was demonstrated in a foundry event and was made to read a children’s’ story, namely “Goldilocks and the Three Bears”. In StorEbook’s rendition of the story, the voice delivered for Papa Bear is deeper, and that for Baby Bear is higher. AT&T notes that the app was designed to understand proper voice inflection in order for it to deliver the correct expressions such as anger, sadness, question, surprise, exclamation, authority, fear, etc. This it does by making use of an extensive library of phonemes that enables it to speak in a natural way, making use of natural and unique voices.
This web based app uses AT&T’s Natural Voices, a state-of-the-art text-to-speech product that converts text into synthetic but natural sounding speech in various voices and languages. The main researcher in this project is Taniya Mishra, a senior member of AT&T Lab’s Speech Algorithms and Engines Research Department. Mishra says that the idea for the project came up while she was reading a story to her young daughter, adding that three year olds would be more than ready to walk away from anything that sounds like a computer reading a story.
The project is still under the development phase, and Mishra’s vision is a system intelligent enough to handle text and render it naturally on its own, analyzing the key traits and choosing the appropriate voice needed to deliver a natural speech for such text. It is also a system capable of advanced affect generation, which means that a wolf will deliver a scary sound and a teacup will make a cute sound. Finally, there is also the feature of Personalized Voices, where the app would read a story imitating the voice of a familiar person using just a few hundred of that person’s recorded spoken word.
Now, you could just imagine how your kids will be delighted to hear you reading to them their favorite story, even when you are miles away.