Sparse coding for speech recognition

Sivaram, G.S.V.S.; Nemala, Sridhar Krishna; Elhilali, Mounya; Tran, Trac D.; Heřmanský, Hynek

doi:10.1109/icassp.2010.5495649

Cited by 67 publications

(49 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In recent years, sparse representation (SR) based features are used for speech recognition where, given a segment of speech signal (frame) and a dictionary, a sparse feature vector is computed for the classification/recognition task [1], [2]. The SR based signal processing is supported by an observation that signal can be written as linear combination of minimum number of atoms of a dictionary [2].…”

Section: Introductionmentioning

confidence: 99%

“…In particular, speech recognition in exemplar based approaches is performed either using the atom activations of the estimated sparse feature vector [3], [4], or using the minimum reconstruction error [5] between the test exemplar and its estimate. On the contrary, in feature based approaches, either the derived sparse vector [1] or the estimate of speech is used as a feature [6] for acoustic modeling. For computing the sparse feature vector, approaches in [3], [4] use a single overcomplete dictionary while [5] use multiple dictionaries corresponding to different speech units.…”

Section: Introductionmentioning

confidence: 99%

“…The proposed method is similar to [1] where sparse vector is used as a feature. However, we propose to use multiple dictionaries as compared to a single overcomplete dictionary used in [1].…”

Section: Introductionmentioning

confidence: 99%

“…For computing the sparse feature vector, approaches in [3], [4] use a single overcomplete dictionary while [5] use multiple dictionaries corresponding to different speech units. A gradient descent approach is used to learn a single overcomplete dictionary using the spectro-temporal representation in [1], while mel frequency cepstral coefficients (MFCC) of training speech data (frames) are used to obtain dictionary atoms in [6] and [2]. However, for a given train/test frame, in [6] and [2] atoms for dictionary are seeded from the training data, which results in high computational complexity.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Class specific GMM based sparse feature for speech units classification

Sharma

Abrol

Dileep

et al. 2017

2017 25th European Signal Processing Conference (EUSIPCO)

View full text Add to dashboard Cite

Abstract-In this paper, features based on the sparse representation (SR) are proposed for the classification of speech units. The proposed method employs multiple dictionaries to effectively model variations present in the speech signal. Here, a Gaussian mixture model (GMM) is built using spectral features corresponding to frames of all the examples of a speech class. Multiple dictionaries corresponding to different mixture are learned using the respective speech frames. Given a train/test speech frame, minimum spectral distance measure from the GMM means is employed to select an appropriate dictionary. The selected dictionary is used to obtain the sparse feature representation, which is used for the classification of speech units. The effectiveness of the proposed feature is demonstrated using continuous density hidden Markov model (CDHMM) based classifiers for (i) classification of isolated utterances of E-set of English alphabet, (ii) classification of consonant-vowel (CV) segments in Hindi language and (iii) classification of phoneme from TIMIT phonetic corpus. Experimental results reveal that the proposed features outperforms existing feature representations for various speech units classification tasks.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Class specific GMM based sparse feature for speech units classification

Sharma

Abrol

Dileep

et al. 2017

2017 25th European Signal Processing Conference (EUSIPCO)

View full text Add to dashboard Cite

show abstract

“…However, compared with image processing, audio processing has payed less attention on sparse coding, which has been ever applied on speech recognition [10], speaker identification [11], speech enhancement [12] and so on. Furthermore, in [13], it proposed a novel algorithm for computing SISC (shift-invariant sparse coding) aimed to implement audio classification.…”

Section: Introductionmentioning

confidence: 99%

MFCC combined with sparse coding for sound event classification under different noise environments

Mao¹,

Wu²,

Liu³

et al. 2017

Proceedings of the 2nd Annual International Conference on Electronics, Electrical Engineering and Information Science (EEEIS 20

View full text Add to dashboard Cite

In recent years, the most popular method for sound event classification can be classified into two types: 1) Extract MFCC or PLP, then train classifier for classification; 2) Convert sound into spectrogram, then use the method of image classification. However, the two methods have not achieved satisfied performance. In order to promote the classification performance, we present classification method for a sound event based on MFCC and sparse coding which has a good effect on capturing the high-level representation features of the input data. Then the coefficients of sparse coding will be employed as new sound event features to train the classification model. Our experimental results demonstrate the great robustness, adaptability and an obvious improvement on sound event classification.

show abstract

Automatic Speech Recognition

Renals¹,

King²

2010

The Handbook of Phonetic Sciences

View full text Add to dashboard Cite

In most of state-of-the-art speech recognition systems, Gaussian mixture models (GMMs) are used to model the density of the emitting states in the hidden Markov models (HMMs). In a conventional system, the model parameters of each GMM are estimated directly and independently given the alignment. This results a large number of model parameters to be estimated, and consequently, a large amount of training data is required to fit the model. In addition, different sources of acoustic variability that impact the accuracy of a recogniser such as pronunciation variation, accent, speaker factor and environmental noise are only weakly modelled and factorized by adaptation techniques such as maximum likelihood linear regression (MLLR), maximum a posteriori adaptation (MAP) and vocal tract length normalisation (VTLN). In this thesis, we will discuss an alternative acoustic modelling approach -the subspace Gaussian mixture model (SGMM), which is expected to deal with these two issues better. In an SGMM, the model parameters are derived from low-dimensional model and speaker subspaces that can capture phonetic and speaker correlations. Given these subspaces, only a small number of state-dependent parameters are required to derive the corresponding GMMs. Hence, the total number of model parameters can be reduced, which allows acoustic modelling with a limited amount of training data. In addition, the SGMM-based acoustic model factorizes the phonetic and speaker factors and within this framework, other source of acoustic variability may also be explored.In this thesis, we propose a regularised model estimation for SGMMs, which avoids overtraining in case that the training data is sparse. We will also take advantage of the structure of SGMMs to explore cross-lingual acoustic modelling for low-resource speech recognition. Here, the model subspace is estimated from out-domain data and ported to the target language system. In this case, only the state-dependent parameters need to be estimated which relaxes the requirement of the amount of training data. To improve the robustness of SGMMs against environmental noise, we propose to apply the joint uncertainty decoding (JUD) technique that is shown to be efficient and effective. We will report experimental results on the Wall Street Journal (WSJ) database and GlobalPhone corpora to evaluate the regularisation and cross-lingual modelling of SGMMs. Noise compensation using JUD for SGMM acoustic models is evaluated on the Aurora 4 database.iii AcknowledgementsFirst and foremost, I would like to thank my supervisor Prof. Steve Renals for his expert guidance, deep insight in the speech technology, patience and encouragement all the way through my doctoral study. It is a great pleasure to work with this wonderful mentor. I also owe a huge gratitude to Dr. Arnab Ghoshal, a colleague, friend and advisor, without whom, I would definitely have a much tougher experience to reach this thesis. Thanks also to KK Chin -my industry supervisor in Toshiba Cambridge research laboratory (CRL). It is a...

show abstract

Sparse coding for speech recognition

Cited by 67 publications

References 12 publications

Class specific GMM based sparse feature for speech units classification

Class specific GMM based sparse feature for speech units classification

MFCC combined with sparse coding for sound event classification under different noise environments

Automatic Speech Recognition

Contact Info

Product

Resources

About