How Google AI’s Parrotron ASR System Helps the Speech Impaired

Article by Limarc Ambalina | August 08, 2019

“Most people take for granted that when they speak, they will be heard and understood.”

-Fadi Biadsy, Research Scientist at Google AI

Parrotron is a new project in development that seeks to make speech-to-text programs, virtual assistants, and other voice recognition programs accessible to those with speech impediments. The project is a joint effort between the Speech Team and Google Brain Team at Google AI. Those with neurological or physical speech impediments can have difficulty communicating with people, let alone being understood by automatic speech recognition (ASR) systems. Research scientists and engineers at Google AI are developing Parrotron to help those with speech impairments to better communicate with both humans and ASR systems.


An overview of the Parrotron model

Virtual assistants, chatbots, and other natural language processing (NLP) solutions which utilize ASR are often inaccessible to those with speech impediments. Parrotron uses an end-to-end deep neural network to convert irregular or atypical speech into fluent speech. Most ASR programs work by converting input speech to text, in order for the system to understand what was said and respond accordingly. However, Parrotron can convert speech directly or “parrot” the speech without the need for the speech to text conversion process.

The video below features Dimitri Kanevsky, a research scientist at Google who is profoundly deaf and learned to speak English using Russian phonetic pronunciations. Dimitri demonstrates Google Assistant’s ability to understand his questions before and after using Parrotron.

Parrotron Demo 1

Parrotron is trained in two phases. The first phase uses millions of utterance pairs which include natural utterances paired with synthesized speech utterances. The natural utterances include a variety of accents, dialects, and noise conditions which create a baseline for “typical” speech. The second training phase is used to train the model on the input speaker’s atypical language patterns which are different for every input speaker. Input speakers contribute utterances to the training data to train the model on each speaker’s unique speech characteristics. 

Parrotron Demo 2

Google AI has also taken utterances from the ALS speech corpus to train the model on general ALS speech patterns. For people with unique speech characteristics, it seems as though the model may take a longer time to train. In Kanevsky’s case, he contributed 15 hours of speech to train the model on his voice. However, the result is remarkable and the applications of this technology show incredible benefits for those with speech impairments.

Do you require your own audio corpus for ASR, text-to-speech, or other machine learning projects? Lionbridge AI provides a variety of audio data collection and audio data annotation services for machine learning. With a specialization in linguistics and a global multilingual crowd, Lionbridge is a leading provider of custom audio AI training data. Learn more about how Lionbridge can help.


Multilingual Audio Data Annotation Services

Lionbridge provides professional audio data annotation services in over 300 languages.

Some of our most popular languages include:

  • Chinese audio data annotation
  • Italian audio data annotation
  • Dutch audio data annotation
  • Japanese audio data annotation
  • French audio data annotation
  • Portuguese audio data annotation
  • German audio data annotation
  • Spanish audio data annotation



Interested? Get high-quality data now
The Author
Limarc Ambalina

Limarc writes content for Lionbridge’s website as part of the marketing team. Born and raised in Canada, Limarc’s love of Japanese pop culture brought him to Japan in 2016 and living in Japan has been his dream come true. Apart from Lionbridge content, you can catch Limarc online writing about anime, video games, and other nerd culture.


    Sign up to our newsletter for fresh developments from the world of training data. Lionbridge brings you interviews with industry experts, dataset collections and more.