Home  >  Article  >  Technology peripherals  >  Microsoft's latest NaturalSpeech2 speech synthesis model: provides more accurate speech reconstruction and avoids stick reading effects

Microsoft's latest NaturalSpeech2 speech synthesis model: provides more accurate speech reconstruction and avoids stick reading effects

WBOY
WBOYforward
2023-08-04 09:41:051019browse

Microsofts latest NaturalSpeech2 speech synthesis model: provides more accurate speech reconstruction and avoids stick reading effects

According to news on July 27, Microsoft recently launched a speech model called NaturalSpeech2. This model adopts a "potential diffusion" design and is at the zero-sample speech synthesis level. The effect is outstanding. Microsoft claims that this model provides a "commercial-grade" voice/singing solution and can give users a high-quality and diverse speech synthesis experience.

Microsoft conducted a series of demos showcasing NaturalSpeech2's ability to generate speech with different speaker identities, prosody, and styles (such as singing) without samples

Microsofts latest NaturalSpeech2 speech synthesis model: provides more accurate speech reconstruction and avoids stick reading effects

▲ The picture source comes from the NaturalSpeech 2 paper

It is reported that, unlike the traditional speech-to-text (TTS) system, Microsoft's NaturalSpeech2 uses "continuous vectors" instead of "discrete markers" to Represent speech, thereby generating more complete speech segments, will not produce the phenomenon of "stick reading (speaking word for word)" that is "lack of emotion".

Microsofts latest NaturalSpeech2 speech synthesis model: provides more accurate speech reconstruction and avoids stick reading effects

▲ The picture source comes from the NaturalSpeech 2 paper

The experimental results show that

NaturalSpeech2 generates speech and speech prompts under zero sample conditions and real The prosody of the speech is nearly consistent, and the naturalness (measured in CMOS) on the LibriTTS and VCTK test sets is indistinguishable from human speech. The papers of this project have been published on GitHub. Interested IT House friends can

click here to visit

.

The above is the detailed content of Microsoft's latest NaturalSpeech2 speech synthesis model: provides more accurate speech reconstruction and avoids stick reading effects. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete