With deepfakes receiving a lot of media coverage recently, synthetic media is a trending topic among AI forums and a growing area of machine learning. The possible threats posed by manipulated or synthetic media has caught the attention of government officials and even led to a House of Representatives hearing in June of 2019. Like every new and emerging technology, synthetic media comes with risks. However, companies like Lyrebird are proof that the positive applications of synthetic media outweigh the negative.
From chatbots to virtual assistants, research in ASR and higher-quality audio training data have led to some of the most useful tech of the current generation. Natural language processing has led to the great developments in speech technology we have today. However, the newest wave of speech technology does not simply understand your voice; it recreates it.
Using Lyrebird technology, we created our own synthetic voice with just one hour of recorded speech. Here are the results.
NOTE: Since the original publishing of this article, Lyrebird has been acquired by Descript. Their services and signup procedures may have changed. We will update this article in the near future.
What is Lyrebird?
Lyrebird is an AI startup based out of Montreal, Canada. The company is building voice synthesis technologies and is one of the first synthetic media companies to make their prototype available for the public to try. Lyrebird can mimic the sound, accent, intonation, and rhythm of someone’s voice using just a few minutes of sample voice recordings.
Users can generate speech from their synthetic voice by simply typing out the dialogue.
Synthetic voices have numerous applications in various industries. Some of the most useful and most interesting applications of synthetic voices include:
- Human-sounding voices for virtual assistants or chatbots
- The scaling of celebrity voices for advertisements and other voice-over work
- Unique artificial voices for company branding
- Scaleable dialogue creation for video games, animation, and more
Lyrebird has also partnered with the ALS Association to help those with ALS create a digital version of their voice. Some of those who suffer from ALS completely lose the ability to speak. By creating a synthetic voice avatar, they can continue to communicate using a virtual voice that sounds like them, long after they lose the ability to use their own.
The program is free to try and experiment with. New users simply need to create an account, record a few samples and submit the sampled voice recordings to train your synthetic voice. The company does not list official prices for those looking to use its services for business or commercial purposes. Those who want to use Lyrebird for business purposes are asked to contact their team directly.
We recorded 360 voice samples which totaled to about one hour of recording time. We downloaded samples of our synthetic voice at the following stages of recording: 30 samples (minimum), 60, 120, 240, and 360. Below are the results after each training phase as well as a sample real voice recording to compare against.
Our Lyrebird Voice After 1 Hour of Recording
“Hi there. Thanks for reading the article. This is a sample recording of my real voice so you can compare it against the synthetic voice.”
“Hello, this is what I sound like after 30 voice recordings which took about 5 minutes of recording time. Peter piper picked a pickled pepper. She sells sea shells by the sea shore.”
“Hello again, this is what I sound like after 60 voice recordings which took about 10 minutes of recording time. Peter piper picked a pickled pepper. She sells sea shells by the sea shore.”
“Fancy meeting you again, this is what I sound like after 120 voice recordings, which took about 20 minutes of recording time. Peter piper picked a pickled pepper. She sells, sea shells, by the sea shore. Am I getting better?”
“Well hello again. We keep bumping into each other, don’t we? My Name is Limarc and I am a writer for Lionbridge AI. This is what I sound like after two hundred and forty voice recordings in total. This took about 40 minutes of recording time. How do you think I sound now? Am I improving? Do I sound like a robot, or do I sound like a human? I’ll let you be the judge. But the truth is, aren’t we all just people looking for a voice?”
Our Synthetic Voice Reading Issac Asimov’s Three Laws of Robotics at 240 recordings
“A robot may not injure a human being or, through inaction, allow a human being to come to harm. A robot must obey orders given it by human beings except where such orders would conflict with the first law. A robot must protect its own existence as long as such protection does not conflict with the first or second law.” – Isaac Asimov’s Laws of Robotics
“Hey everyone. Once again, my Name is Limarc and I am a writer for Lionbridge AI. This is what I sound like after three hundred and sixty voice recordings. In total, this took about 60 minutes of recording time. How do you think I sound now? Have I gotten better? Do I sound like a robot, or do I sound like a human? I’ll let you be the judge.”
Our Synthetic Voice Reciting Poetry at 360 Recordings
“Beyond this place of wrath and tears
Looms but the Horror of the shade,
And yet the menace of the years
Finds and shall find me unafraid.
It matters not how strait the gate.
How charged with punishments the scroll,
I am the master of my fate,
I am the captain of my soul.”
– Invictus by William Ernest Henley
While the synthetic voices created by Lyrebird may sound robotic at first, after the voice was trained with more data, there were significant improvements in quality. Our Lyrebird voice became much clearer, and the intonation and rhythm began to sound much more human as we added more and more sample recordings. However, even after one hour of recording, there was still some distinct static background noise in the generated voice samples. For professional use, Lyrebird states that they can create a synthetic voice with just two hours of recording.
How to Create Your Own Synthetic Voice Using Lyrebird
Using the Lyrebird online platform is simple and easy, even for those with no experience in machine learning.
1. Create Your Account
The first step is to create your own account on the Lyrebird signup page. Lyrebird doesn’t ask for payment information or any personal info. All you need to enter is your email address, a display name, and password.
2. Start Recording
Once your account is setup, you can immediately start recording your voice samples. All it takes is 30 voice samples, or 5 minutes of recording, for Lyrebird to create a synthetic copy of your voice. However, the more samples you give Lyrebird, the better your synthetic voice will be. It is also highly recommended to record your voice samples in a quiet room with no background noise.
In terms of hardware, we saw significant improvement when switching from our laptop’s built-in microphone to a dedicated external mic. Specifically, we used the BOYA BY-M1DM omni-directional lavalier mic when recording our voice samples.
Each sample recording is just one or two sentences and takes around 6 – 10 seconds to record, and 30 voice samples should take about five minutes to record.
3. Create Your Digital Voice
Once you have the required amount of 30 sample recordings. The button “Create my digital voice” should appear. Click the button and your synthetic voice will be put in the queue for training. Once your voice is ready, you will receive an email notification. You can then start typing out dialogue to generate samples of your virtual voice.
4. Add More Recordings
You will likely notice that 30 recordings isn’t enough to make a convincing synthetic copy of your voice. The first version will likely sound robotic with strange intonation. To add more recordings, simply click the recordings tab and record more samples until you are satisfied. Finally, click the “Recreate my digital voice” button to put your new recordings into the queue to train.
While the technology is still in its early phases, synthetic voices show potential to improve various industries and lives around the world. Are you looking to create synthetic voices of your own? Building a model that requires a large corpus of audio data? Lionbridge AI has access to a global crowd of 500,000 experts ready to collect, create, or annotate your audio data. Learn more about how we can provide audio training data for your algorithms.
Multilingual AI Training Data Services
Lionbridge provides professional AI training data services in over 300 languages.
Some of our most popular languages include: