Evolutionary Discriminative Confidence Estimation for Spoken Term Detection
Spoken Term Detection (STD) is the task of searching for occurrences of spoken terms in audio archives. It relies on robust confidence estimation to make a hit/False Alarm (FA) decision. In order to optimize the decision in terms of the STD evaluation metric, the confidence has to be discriminative. Multi-Layer Perceptrons (MLPs) and Support Vector Machines (SVMs) exhibit good performance in producing discriminative confidence; however, they are severely limited by the continuous objective functions, and are, therefore, less capable of dealing with complex decision tasks. This leads to a substantial performance reduction when measuring detection of Out-Of-Vocabulary (OOV) terms, where the high diversity in term properties usually leads to a complicated decision boundary.