How to Create Your Own Synthetic Voice With Just One Hour of Speech (Lyrebird Review)

Article by Limarc Ambalina | August 26, 2019

With deepfakes receiving a lot of media coverage recently, synthetic media is a trending topic among AI forums and a growing area of machine learning. The possible threats posed by manipulated or synthetic media has caught the attention of government officials and even led to a House of Representatives hearing in June of 2019. Like every new and emerging technology, synthetic media comes with risks. However, companies like Lyrebird are proof that the positive applications of synthetic media outweigh the negative. 

From chatbots to virtual assistants, research in ASR and higher-quality audio training data have led to some of the most useful tech of the current generation. Natural language processing has led to the great developments in speech technology we have today. However, the newest wave of speech technology does not simply understand your voice; it recreates it. 

Using Lyrebird technology, we created our own synthetic voice with just one hour of recorded speech. Here are the results.

 

What is Lyrebird?

Lyrebird is an AI startup based out of Montreal, Canada. The company is building voice synthesis technologies and is one of the first synthetic media companies to make their prototype available for the public to try. Lyrebird can mimic the sound, accent, intonation, and rhythm of someone’s voice using just a few minutes of sample voice recordings. 

Users can generate speech from their synthetic voice by simply typing out the dialogue. 

Use Cases

Synthetic voices have numerous applications in various industries. Some of the most useful and most interesting applications of synthetic voices include:

  • Human-sounding voices for virtual assistants or chatbots
  • The scaling of celebrity voices for advertisements and other voice-over work
  • Unique artificial voices for company branding 
  • Scaleable dialogue creation for video games, animation, and more 

Lyrebird has also partnered with the ALS Association to help those with ALS create a digital version of their voice. Some of those who suffer from ALS completely lose the ability to speak. By creating a synthetic voice avatar, they can continue to communicate using a virtual voice that sounds like them, long after they lose the ability to use their own.

Cost

The program is free to try and experiment with. New users simply need to create an account, record a few samples and submit the sampled voice recordings to train your synthetic voice. The company does not list official prices for those looking to use its services for business or commercial purposes. Those who want to use Lyrebird for business purposes are asked to contact their team directly. 

 

Lyrebird Review

We recorded 360 voice samples which totaled to about one hour of recording time. We downloaded samples of our synthetic voice at the following stages of recording: 30 samples (minimum), 60, 120, 240, and 360. Below are the results after each training phase as well as a sample real voice recording to compare against.

Our Lyrebird Voice After 1 Hour of Recording

Actual Voice

“Hi there. Thanks for reading the article. This is a sample recording of my real voice so you can compare it against the synthetic voice.”

30 Recordings

“Hello, this is what I sound like after 30 voice recordings which took about 5 minutes of recording time. Peter piper picked a pickled pepper. She sells sea shells by the sea shore.”

60 Recordings

“Hello again, this is what I sound like after 60 voice recordings which took about 10 minutes of recording time. Peter piper picked a pickled pepper. She sells sea shells by the sea shore.”

120 Recordings

“Fancy meeting you again, this is what I sound like after 120 voice recordings, which took about 20 minutes of recording time. Peter piper picked a pickled pepper. She sells, sea shells, by the sea shore. Am I getting better?”

240 Recordings

“Well hello again. We keep bumping into each other, don’t we? My Name is Limarc and I am a writer for Lionbridge AI. This is what I sound like after two hundred and forty voice recordings in total. This took about 40 minutes of recording time. How do you think I sound now? Am I improving? Do I sound like a robot, or do I sound like a human? I’ll let you be the judge. But the truth is, aren’t we all just people looking for a voice?”

Our Synthetic Voice Reading Issac Asimov’s Three Laws of Robotics at 240 recordings

“A robot may not injure a human being or, through inaction, allow a human being to come to harm. A robot must obey orders given it by human beings except where such orders would conflict with the first law. A robot must protect its own existence as long as such protection does not conflict with the first or second law.”       – Isaac Asimov’s Laws of Robotics

360 Recordings

“Hey everyone. Once again, my Name is Limarc and I am a writer for Lionbridge AI. This is what I sound like after three hundred and sixty voice recordings. In total, this took about 60 minutes of recording time. How do you think I sound now? Have I gotten better? Do I sound like a robot, or do I sound like a human? I’ll let you be the judge.”

Our Synthetic Voice Reciting Poetry at 360 Recordings

 
“Out of the night that covers me, 
      Black as the pit from pole to pole, 
I thank whatever gods may be 
      For my unconquerable soul. 
 
In the fell clutch of circumstance 
      I have not winced nor cried aloud. 
Under the bludgeonings of chance 
      My head is bloody, but unbowed.”
 
 

 

“Beyond this place of wrath and tears 
      Looms but the Horror of the shade, 
And yet the menace of the years 
      Finds and shall find me unafraid.

It matters not how strait the gate.
How charged with punishments the scroll, 
I am the master of my fate, 
I am the captain of my soul.”

– Invictus by William Ernest Henley

 

While the synthetic voices created by Lyrebird may sound robotic at first, after the voice was trained with more data, there were significant improvements in quality. Our Lyrebird voice became much clearer, and the intonation and rhythm began to sound much more human as we added more and more sample recordings. However, even after one hour of recording, there was still some distinct static background noise in the generated voice samples. For professional use, Lyrebird states that they can create a synthetic voice with just two hours of recording. 

 

How to Create Your Own Synthetic Voice Using Lyrebird

Using the Lyrebird online platform is simple and easy, even for those with no experience in machine learning.

How to Create Your Own Synthetic Voice With Just One Hour of Speech (Lyrebird Review) - platform

 

1. Create Your Account

The first step is to create your own account on the Lyrebird signup page. Lyrebird doesn’t ask for payment information or any personal info. All you need to enter is your email address, a display name, and password.

2. Start Recording

Once your account is setup, you can immediately start recording your voice samples. All it takes is 30 voice samples, or 5 minutes of recording, for Lyrebird to create a synthetic copy of your voice. However, the more samples you give Lyrebird, the better your synthetic voice will be. It is also highly recommended to record your voice samples in a quiet room with no background noise.

In terms of hardware, we saw significant improvement when switching from our laptop’s built-in microphone to a dedicated external mic. Specifically, we used the BOYA BY-M1DM omni-directional lavalier mic when recording our voice samples.

Each sample recording is just one or two sentences and takes around 6 – 10 seconds to record, and 30 voice samples should take about five minutes to record.

3. Create Your Digital Voice

Once you have the required amount of 30 sample recordings. The button “Create my digital voice” should appear. Click the button and your synthetic voice will be put in the queue for training. Once your voice is ready, you will receive an email notification. You can then start typing out dialogue to generate samples of your virtual voice.

4. Add More Recordings

You will likely notice that 30 recordings isn’t enough to make a convincing synthetic copy of your voice. The first version will likely sound robotic with strange intonation. To add more recordings, simply click the recordings tab and record more samples until you are satisfied. Finally, click the “Recreate my digital voice” button to put your new recordings into the queue to train. 

 

While the technology is still in its early phases, synthetic voices show potential to improve various industries and lives around the world. Are you looking to create synthetic voices of your own? Building a model that requires a large corpus of audio data? Lionbridge AI has access to a global crowd of 500,000 experts ready to collect, create, or annotate your audio data. Learn more about how we can provide audio training data for your algorithms.

 

Multilingual AI Training Data Services

Lionbridge provides professional AI training data services in over 300 languages.

Some of our most popular languages include:

  • Chinese AI training data
  • Italian AI training data
  • Dutch AI training data
  • Japanese AI training data
  • French AI training data
  • Portuguese AI training data
  • German AI training data
  • Spanish AI training data
Interested? Get high-quality data now
The Author
Limarc Ambalina

Limarc writes content for Lionbridge’s website as part of the marketing team. Born and raised in Canada, Limarc’s love of Japanese pop culture brought him to Japan in 2016 and living in Japan has been his dream come true. Apart from Lionbridge content, you can catch Limarc online writing about anime, video games, and other nerd culture.

Welcome!

Sign up to our newsletter for fresh developments from the world of training data. Lionbridge brings you interviews with industry experts, dataset collections and more.