This article is an installment of Future Explored, a weekly guide to world-changing technology. You can get stories like this one straight to your inbox every Thursday morning by subscribing here.
In January, Microsoft unveiled an AI that can clone a speaker’s voice after hearing them talk for just three seconds. While this system, VALL-E, was far from the first voice cloning AI, its accuracy and need for such a small audio sample set a new bar for the tech.
Microsoft has now raised that bar again with an update called “VALL-E X,” which can clone a voice from a short sample (4 to 10 seconds) and then use it to synthesize speech in a different language, all while preserving the original speaker’s voice, emotion, and tone.
Comments are closed.