Researchers from the Beijing-based technology firm trained its text-to-speech synthesis system on more than 800 hours of audio, taken from around 2,400 different speakers.
To work at its best, Deep Voice requires 100 five-second sections of sound but it can trick a voice recognition system 95 per cent of the time with just ten five-second samples.
The technology could duplicate the voices of people who have lost the ability to use their voice, developers say.
Read more
Comments are closed.