Order Statistics for Voice Activity Detection in VoIP

Real-time voice communication over the Internet has rapidly gained popularity. It is indeed essential to reduce the total bandwidth consumption to efficiently use the available bandwidth for the subscribers having low speed connectivity and even otherwise. In this paper, the authors introduce a novel technique to identify the voice and silent regions of a speech stream that is very much suitable for VoIP calls. They use an entropy measure, which is based on the spacings of order statistics of speech frames to differentiate the silence zones from the speech zones. They developed an algorithm that uses an adaptive thresholding to minimize the misdetection. The performance of their approach is compared with the built-in VAD of AMR codec.