2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2012
DOI: 10.1109/icassp.2012.6288824
|View full text |Cite
|
Sign up to set email alerts
|

Normalized amplitude modulation features for large vocabulary noise-robust speech recognition

Abstract: Background noise and channel degradations seriously constrain the performance of state-of-the-art speech recognition systems. Studies comparing human speech recognition performance with automatic speech recognition systems indicate that the human auditory system is highly robust against background noise and channel variabilities compared to automated systems. A traditional way to add robustness to a speech recognition system is to construct a robust feature set for the speech recognition model. In this work, w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
37
0

Year Published

2013
2013
2017
2017

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 77 publications
(37 citation statements)
references
References 14 publications
0
37
0
Order By: Relevance
“…For improving robustness, the normalized modulation spectra have been proposed in [23]. Similar work in the context of large vocabulary speech recognition such as noisy Wall Street Journal (New York, NY, USA) and GALE task as reported in [24,25].…”
Section: Related Workmentioning
confidence: 88%
“…For improving robustness, the normalized modulation spectra have been proposed in [23]. Similar work in the context of large vocabulary speech recognition such as noisy Wall Street Journal (New York, NY, USA) and GALE task as reported in [24,25].…”
Section: Related Workmentioning
confidence: 88%
“…The MFCCs were also augmented with a 10-dimensional voicing feature vector [12]. The three novel features explored were: (1) The Normalized Modulation Cepstral Coefficient (NMCC) [13], obtained from tracking the amplitude modulations of the sub-band speech signals in time domain. The produced 52-dimensional vector was reduced to 20 with principal component analysis (PCA) (NMCC20).…”
Section: Sri Asr Systemsmentioning
confidence: 99%
“…followed by cepstral feature extraction; or (2) by using noise robust speech-processing approaches, where noiserobust transforms and/or human perception based speech analysis methodologies are deployed for acoustic-feature generation (e.g., ETSI [European Telecomm. Standards Institute] advanced frontend [4], power normalized cepstral coefficients [PNCC] [5], modulation based features [6,7], and several others).…”
Section: Introductionmentioning
confidence: 99%