Speech activity detection (SAD) on channel transmissions is a critical preprocessing task for speech, speaker and language recognition or for further human analysis. This paper presents a feature combination approach to improve SAD on highly channel degraded speech as part of the Defense Advanced Research Projects Agency's (DARPA) Robust Automatic Transcription of Speech (RATS) program. The key contribution is the feature combination exploration of different novel SAD features based on pitch and spectro-temporal processing and the standard Mel Frequency Cepstral Coefficients (MFCC) acoustic feature. The SAD features are: (1) a GABOR feature representation, followed by a multilayer perceptron (MLP); (2) a feature that combines multiple voicing features and spectral flux measures (Combo); (3) a feature based on subband autocorrelation (SAcC) and MLP postprocessing and (4) a multiband comb-filter F0 (MBCombF0) voicing measure. We present single, pairwise and all feature combinations, show high error reductions from pairwise feature level combination over the MFCC baseline and show that the best performance is achieved by the combination of all features.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.