Currently, there are technology barriers inhibiting speech processing systems that work in extremely noisy conditions from meeting the demands of modern applications. This letter presents a new voice activity detector (VAD) for improving speech detection robustness in noisy environments and the performance of speech recognition systems. The algorithm defines an optimum likelihood ratio test (LRT) involving multiple and independent observations. The so-defined decision rule reports significant improvements in speech/nonspeech discrimination accuracy over existing VAD methods that are defined on a single observation and need empirically tuned hangover mechanisms. The algorithm has an inherent delay that, for several applications, including robust speech recognition, does not represent a serious implementation obstacle. An analysis of the overlap between the distributions of the decision variable shows the improved robustness of the proposed approach by means of a clear reduction of the classification error as the number of observations is increased. The proposed strategy is also compared to different VAD methods, including the G.729, AMR, and AFE standards, as well as recently reported algorithms showing a sustained advantage in speech/nonspeech detection accuracy and speech recognition performance.Index Terms-Multiple observation likelihood ratio test (MO-LRT), robust speech recognition, voice activity detection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.