Date Added: Dec 2009
In this paper, the authors propose a novel idea for using two different feature streams in a continuous speech recognition system. Conventionally multiple feature streams are concatenated and HMMs trained to build triphone/syllable models. In this paper, instead of concatenation, they build separate subword HMMs for each of the feature streams during training. Also during training, the relevance of a feature stream to a particular sound is evaluated. During testing, hypotheses are generated by the language model. A greedy Kullback Leibler distance measure is used to determine the best feature at a particular instant, for the given hypotheses.