Robust speech recognition in noisy environments based on subband spectral centroid histograms

Gajić, Bojana; Paliwal, Kuldip K.

doi:10.1109/tsa.2005.855834

Cited by 52 publications

(45 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It can be seen clearly that as compared to others, speech signals have significantly more weight on low frequency spectrum from 300Hz to 600Hz. In order to accomplish speech detection in real time, we have implemented the SSCH (Subband Spectral Centroid Histogram) algorithm [29] on mobile devices. Specifically, SSCH passes the power spectrum of the recorded sound clip to a set of highly overlapping bandpass filters and then computes the spectral centroid 3 on each subband and finally constructs a histogram of the subband spectral centroid values.…”

Section: Real-time Background Sound Recognitionmentioning

confidence: 99%

A framework of energy efficient mobile sensing for automatic user state recognition

Wang

Lin

Annavaram

et al. 2009

Proceedings of the 7th International Conference on Mobile Systems, Applications, and Services

374

211

View full text Add to dashboard Cite

Urban sensing, participatory sensing, and user activity recognition can provide rich contextual information for mobile applications such as social networking and location-based services. However, continuously capturing this contextual information on mobile devices is difficult due to battery life limitations. In this paper, we present the framework design for an Energy Efficient Mobile Sensing System (EEMSS) that powers only necessary and energy efficient sensors and manages sensors hierarchically to recognize user state as well as detect state transitions. We also present the design, implementation, and evaluation of EEMSS that automatically recognizes user daily activities in real time using sensors on an off-the-shelf high-end smart phone. Evaluation of EEMSS with 10 users over one week shows that it increases the smart phone's battery life by more than 75% while maintaining both high accuracy and low latency in identifying transitions between end-user activities.

show abstract

Section: Real-time Background Sound Recognitionmentioning

confidence: 99%

A framework of energy efficient mobile sensing for automatic user state recognition

Wang

Lin

Annavaram

et al. 2009

Proceedings of the 7th International Conference on Mobile Systems, Applications, and Services

374

211

View full text Add to dashboard Cite

show abstract

“…Indeed, in this case, for any fixed n, the corresponding cross section of the spectrogram |(W g f)(n, x)| 2 is but a trigonometric polynomial in terms of the frequency x. As such, the integrals in (5) can be decomposed into a large linear combination of the symbolically-evaluated integrals (10) and (11). Indeed, SCA is a contribution to the existing literature precisely because it delivers a highly accurate computation of (5) at a reasonable cost.…”

Section: The Spectral Centroid Algorithmmentioning

confidence: 99%

“…Experimentation indicates (3) is less sensitive to noise than (2), making it a popular tool in speech processing [5,11,17]. Moreover, while (2) depends on pitch alone, the spectral centroid (3) depends on both pitch and timbre, a useful property in music processing [16].…”

Section: Introductionmentioning

confidence: 99%

Fast computation of spectral centroids

et al. 2010

View full text Add to dashboard Cite

The spectral centroid of a signal is the curve whose value at any given time is the centroid of the corresponding constant-time cross section of the signal's spectrogram. A spectral centroid provides a noise-robust estimate of how the dominant frequency of a signal changes over time. As such, spectral centroids are an increasingly popular tool in several signal processing applications, such as speech processing. We provide a new, fast and accurate algorithm for the real-time computation of the spectral centroid of a discretetime signal. In particular, by exploiting discrete Fourier transforms, we show how one can compute the spectral centroid of a signal without ever needing to explicitly compute the signal's spectrogram. We then apply spectral centroids to an emerging biometrics problem: to determine a person's heart and breath rates by measuring the Doppler shifts their body movements induce in a continuous wave radar signal. We apply our algorithm to real-world radar data, obtaining heart-and breath-rate estimates that compare well against ground truth.

show abstract

“…Ensemble interval histograms (EIH) are probably the most well-known auditorybased features [8]. In [9], a novel feature set called sub-band spectral centroid histograms (SSCH) integrates dominantfrequency information with sub-band power information. Another type of feature widely used in current ASR systems is perceptual linear prediction (PLP) [10].…”

Section: Introductionmentioning

confidence: 99%

Noise Robust Feature Scheme for Automatic Speech Recognition Based on Auditory Perceptual Mechanisms

Cai

Xiao

Pan

et al. 2012

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYMel Frequency Cepstral Coefficients (MFCC) are the most popular acoustic features used in automatic speech recognition (ASR), mainly because the coefficients capture the most useful information of the speech and fit well with the assumptions used in hidden Markov models. As is well known, MFCCs already employ several principles which have known counterparts in the peripheral properties of human hearing: decoupling across frequency, mel-warping of the frequency axis, logcompression of energy, etc. It is natural to introduce more mechanisms in the auditory periphery to improve the noise robustness of MFCC. In this paper, a k-nearest neighbors based frequency masking filter is proposed to reduce the audibility of spectra valleys which are sensitive to noise. Besides, Moore and Glasberg's critical band equivalent rectangular bandwidth (ERB) expression is utilized to determine the filter bandwidth. Furthermore, a new bandpass infinite impulse response (IIR) filter is proposed to imitate the temporal masking phenomenon of the human auditory system. These three auditory perceptual mechanisms are combined with the standard MFCC algorithm in order to investigate their effects on ASR performance, and a revised MFCC extraction scheme is presented. Recognition performances with the standard MFCC, RASTA perceptual linear prediction (RASTA-PLP) and the proposed feature extraction scheme are evaluated on a medium-vocabulary isolated-word recognition task and a more complex large vocabulary continuous speech recognition (LVCSR) task. Experimental results show that consistent robustness against background noise is achieved on these two tasks, and the proposed method outperforms both the standard MFCC and RASTA-PLP.

show abstract

Robust speech recognition in noisy environments based on subband spectral centroid histograms

Cited by 52 publications

References 19 publications

A framework of energy efficient mobile sensing for automatic user state recognition

A framework of energy efficient mobile sensing for automatic user state recognition

Fast computation of spectral centroids

Noise Robust Feature Scheme for Automatic Speech Recognition Based on Auditory Perceptual Mechanisms

Contact Info

Product

Resources

About