Download now Free registration required
Under severe channel mismatch conditions, such as training with far-field speech and testing with telephone data, performance of Speaker Identification (SID) degrades significantly, often below practical use. But for many SID tasks, it is sufficient to recognize an N-best list of speakers for further human analysis. This paper investigates N-best SID accuracy for matched (telephone/ telephone) and mismatched (far-field/telephone) train/test channel conditions. Using an SVM-GMM supervector (GSV), Pitch and formant Frequency Histograms (PFH) and cross-channel adaptation using cohorts, they reduced matched channel error rate by over 25% relative to the baseline (GMM-UBM), for top-1, and achieved mismatched N-best accuracy comparable to the baseline.
- Format: PDF
- Size: 140.7 KB