Download Now Free registration required
This paper addresses selecting between candidate pronunciations for out-of-vocabulary words in speech processing tasks. The authors introduce a simple, unsupervised method that outperforms the conventional supervised method of forced alignment with a reference. The success of this method is independently demonstrated using three metrics from large-scale speech tasks: word error rates for large vocabulary continuous speech recognition, decision error tradeoff curves for spoken term detection, and phone error rates compared to a handcrafted pronunciation lexicon. The experiments were conducted using state-of-the-art recognition, indexing, and retrieval systems. The results were compared across many terms, hundreds of hours of speech, and well known data sets.
- Format: PDF
- Size: 189.4 KB