Analysis of MLP-Based Hierarchical Phoneme Posterior Probability Estimator

Pinto, Joel; Garimella, Sri; Magimai-Doss, Mathew; Heřmanský, Hynek; Bourlard, Hervé

doi:10.1109/tasl.2010.2045943

Cited by 68 publications

(75 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…If this second stage net is trained on several neighboring frames -similar to the MFCC-based net -then it is able to correct some of the errors of the lower stage net(s) with the help of the long-term context. Hence, applying such a second stage network is already useful in itself, as was recently shown in [15] or [16]. We will compare our earlier results with two 2-stage configurations: the first is trained only on the MFCC-based posteriors, while the second combines the MFCC-based and the 2D-DCT based probabilities.…”

Section: Noisy Speech Experimentsmentioning

confidence: 79%

“…Both physiological and psychoacoustic experimental results indicate that the human brain extracts information from much longer time spans. Technically the simplest solution for this is to work with larger windows along the timeaxis: in neural-net based recognizers it is now standard practice to train the system on 9 or more neighboring MFCC vectors [7,15,16]. However, there is also evidence that the brain processes relatively narrow frequency bands quasi-separately [2,6].…”

Section: Localized Spectro-temporal Featuresmentioning

confidence: 99%

See 1 more Smart Citation

Phone recognition experiments with 2D-DCT spectro-temporal features

Kovács

Tóth

2011

2011 6th IEEE International Symposium on Applied Computational Intelligence and Informatics (SACI)

View full text Add to dashboard Cite

Abstract-Localized spectro-temporal analysis is a novel feature extraction strategy in speech recognition, which was inspired by neurophysiological findings. Here we perform phone recognition experiments on features that are extracted from the patches of the critical-band log-energy spectrum by applying the two-dimensional cosine transform. We find that in phone recognition experiments the proposed feature set yields results similar to the standard MFCC features under clean conditions, while it provides a significantly smaller performance degradation in noisy conditions. Moreover, we show that the new and the standard features can be readily combined to improve the recognition accuracy still further.

show abstract

Section: Noisy Speech Experimentsmentioning

confidence: 79%

Section: Localized Spectro-temporal Featuresmentioning

confidence: 99%

Phone recognition experiments with 2D-DCT spectro-temporal features

Kovács

Tóth

2011

2011 6th IEEE International Symposium on Applied Computational Intelligence and Informatics (SACI)

View full text Add to dashboard Cite

show abstract

“…The 61 hand labeled phonetic symbols are mapped to set of 39 phonemes with an additional garbage class [7]. The experimental setup is exactly same as the one described in [8]. All the MLPs (for phoneme posterior and articulatory posterior estimation) use the PLP cepstral coefficients with a context window of 9 frames as input.…”

Section: Methodsmentioning

confidence: 99%

“…The size of the hidden layer of all the MLPs is determined by fixing the total number of parameters to 35% of the training data following the previous work [8]. The articulatory posteriors and phoneme posteriors are estimated from MLP trained using ICSI Quicknet software 1 .…”

Section: Methodsmentioning

confidence: 99%

“…This approach showed improvements in articulatory feature classification compared to an equivalent system where they were treated independently. Motivated from the previous studies [4], [10], and the hierarchical MLP framework [8], we investigate a novel multi-stage MLP classifier based approach to model the inter-feature dependencies of articulatory features.…”

Section: Fig 2 Multi-stage Mlp Classifiers For Articulatory Posterimentioning

confidence: 99%

See 1 more Smart Citation

Integrating articulatory features using Kullback-Leibler divergence based acoustic model for phoneme recognition

Rasipuram

Mathew

2011

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

In this paper, we propose a novel framework to integrate articulatory features (AFs) into HMM-based ASR system. This is achieved by using posterior probabilities of different AFs (estimated by multilayer perceptrons) directly as observation features in Kullback-Leibler divergence based HMM (KL-HMM) system. On the TIMIT phoneme recognition task, the proposed framework yields a phoneme recognition accuracy of 72.4% which is comparable to KL-HMM system using posterior probabilities of phonemes as features (72.7%). Furthermore, a best performance of 73.5% phoneme recognition accuracy is achieved by jointly modeling AF probabilities and phoneme probabilities as features. This shows the efficacy and flexibility of the proposed approach.

show abstract

Hierarchical deep belief networks based point process model for keywords spotting in continuous speech

Wang

Yang

et al. 2013

Int J Communication

View full text Add to dashboard Cite

Summary Point process model keyword spotting (KWS) system has attracted considerable attentions in the areas of keyword spotting by its capacity that can generalize from a relatively small numbers of training examples. But unfortunately, the accuracy level of the point process model is not comparable with the state‐of‐the‐art KWS systems because of the poor modeling capacity of the phoneme detector, which are based on Gaussian Mixture Models. In this paper, focus on improving the performance of detector in point process model, we propose an enhanced version of point process model, which is based on hierarchical deep belief networks (DBNs). Hierarchical DBNs are used as the phoneme detector in this system, and they combine the advantages of both the DBN and the hierarchical architecture for capturing complex statistical patterns in speech while overcoming the inherent flaws of conventional hidden Markov models and multilayer layer perceptron. Experiments results on TIMIT database show that the proposed method can yield 2% improvement. Furthermore, in the case when training examples are extremely limited, it can achieve better results over state‐of‐the‐art KWS systems. Copyright © 2013 John Wiley & Sons, Ltd.

show abstract

Analysis of MLP-Based Hierarchical Phoneme Posterior Probability Estimator

Cited by 68 publications

References 49 publications

Phone recognition experiments with 2D-DCT spectro-temporal features

Phone recognition experiments with 2D-DCT spectro-temporal features

Integrating articulatory features using Kullback-Leibler divergence based acoustic model for phoneme recognition

Hierarchical deep belief networks based point process model for keywords spotting in continuous speech

Contact Info

Product

Resources

About