05When I was a kid, voice changers were the coolest thing ever. Everyone wanted to call their family and friends while sounding like Darth Vader or with a squeaky elf voice using what looked like a voice raygun held to the receiver of the telephone.
As years went on, voice changing tools became increasingly complex and applications like Skype and then Snapchat picked up fun voice changing effects, but even then, voice possibilities were limited and overlaid over the user’s own speech. Although the sound quality was changed, other factors such as verbiage and rhythm of speech remained the same as the user’s natural voice.
Enter Lyrebird stage left with the most advanced voice mimicking technology to date. Lyrebird lays claim to an artificial intelligence that can learn vocal tone and rhythm within a matter of minutes, giving it the sometime startling capability to mimic any voice, albeit with a little bit of background distortion.
Sound clip from Lyrebird
That distortion is easily masked by adding a little white noise, offering a surprisingly realistic speech from anyone in the world with the same natural sway and emotion as you might expect from a human, even in beta stage.
Just as many disruptive and emergent technologies have opened up new ethical concerns (virtual reality porn, for example), this AI speech synthesizer has great potential to be used by unsavory types.
Lyrebird wasn’t created for criminals and fraudsters.
In fact, Lyrebird and other forms of speech synthesizers have been created with the purpose of converting text to speech that involves conveying the same complexity of meaning as human speech–a great help for people with disabilities that prevent them from reading text, but also a major source of convenience and safety even for everyday users.
However, such technology raises obvious concerns about those people who will create inauthentic renditions of speeches to be used against their likeness for nefarious purposes, yet another facet of fake news culture on the internet. Examples of synthesized speech from famous politicians such as Obama and Trump already exist.
One might expect that such a complex-sounding AI would require hours of training in order to properly mimic a voice and even longer in order to generate its own sentences in that voice, but this is not the case. Lyrebird can learn to mimic a voice with only a 1-minute long recording and can then generate a thousand sentences true to that voice in under a half of a second.
It does this by learning the makeup of speech, the “speech DNA,” during the initial training of the AI, steps that do not have to be repeated from the very beginning for each new voice. Instead it learns the basic DNA from the new voice, but not the entire structure again which lends this AI its speed.
Hopefully, though, humanity can pull together to use this novel technology for good instead of evil more frequently than not, and anti-AI AI devices are already cropping up as well to alert people that they are interacting with AIs instead of humans.
Lyrebird has an array of features beyond mimicking celebrities. Their real-time voice conversion AI can learn voices, but users can also generate their own new and unique voices for their projects. Plus, the Lyrebird AI can take emotional input and generate its voice to express anger, stress, sympathy, and more. Additionally, small human sounds such as breathing and lip smacking make the AI sound even more lifelike, as an absence of these sounds can make a voice sound uncannily inhuman.
Lyrebird could be used to great benefit in an array of industries from marketing to video games and everything in between, but with great power comes great responsibility. No matter what, I’m sure we will be hearing exponentially increasing amounts of AI-generated speech in the very near future.
Are you excited about synthetic speech? What are your concerns about it? What about its most useful benefits? Let us know in the comments below!