Download Now Free registration required
This paper describes reducing phone label errors in TTS voice building by means of modeling of speaker pronunciation variants. Each speaker has his or her own unique pronunciations (and context-dependent variations), so that no one standard lexicon is able to cover all of the speaker's variations. Creating speaker-dependent pronunciation lexicons for automatic speech labeling of their TTS voice databases helped to eliminate many pronunciation errors that resulted from mismatches between lexical pronunciations and how the speaker (voice talent) actually pronounced a word. They also found that it contributed other synthesis quality improvement as well. A perceptual test showed that their work contributed to MOS improvement for American English male and female voices.
- Format: PDF
- Size: 51.5 KB