2011
DOI: 10.1109/tasl.2010.2045943
|View full text |Cite
|
Sign up to set email alerts
|

Analysis of MLP-Based Hierarchical Phoneme Posterior Probability Estimator

Abstract: Abstract-We analyze a simple hierarchical architecture consisting of two multilayer perceptron (MLP) classifiers in tandem to estimate the phonetic class conditional probabilities. In this hierarchical setup, the first MLP classifier is trained using standard acoustic features. The second MLP is trained using the posterior probabilities of phonemes estimated by the first, but with a long temporal context of around 150-230 ms. Through extensive phoneme recognition experiments, and the analysis of the trained se… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
72
1

Year Published

2011
2011
2021
2021

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 68 publications
(75 citation statements)
references
References 49 publications
2
72
1
Order By: Relevance
“…If this second stage net is trained on several neighboring frames -similar to the MFCC-based net -then it is able to correct some of the errors of the lower stage net(s) with the help of the long-term context. Hence, applying such a second stage network is already useful in itself, as was recently shown in [15] or [16]. We will compare our earlier results with two 2-stage configurations: the first is trained only on the MFCC-based posteriors, while the second combines the MFCC-based and the 2D-DCT based probabilities.…”
Section: Noisy Speech Experimentsmentioning
confidence: 79%
See 1 more Smart Citation
“…If this second stage net is trained on several neighboring frames -similar to the MFCC-based net -then it is able to correct some of the errors of the lower stage net(s) with the help of the long-term context. Hence, applying such a second stage network is already useful in itself, as was recently shown in [15] or [16]. We will compare our earlier results with two 2-stage configurations: the first is trained only on the MFCC-based posteriors, while the second combines the MFCC-based and the 2D-DCT based probabilities.…”
Section: Noisy Speech Experimentsmentioning
confidence: 79%
“…Both physiological and psychoacoustic experimental results indicate that the human brain extracts information from much longer time spans. Technically the simplest solution for this is to work with larger windows along the timeaxis: in neural-net based recognizers it is now standard practice to train the system on 9 or more neighboring MFCC vectors [7,15,16]. However, there is also evidence that the brain processes relatively narrow frequency bands quasi-separately [2,6].…”
Section: Localized Spectro-temporal Featuresmentioning
confidence: 99%
“…The 61 hand labeled phonetic symbols are mapped to set of 39 phonemes with an additional garbage class [7]. The experimental setup is exactly same as the one described in [8]. All the MLPs (for phoneme posterior and articulatory posterior estimation) use the PLP cepstral coefficients with a context window of 9 frames as input.…”
Section: Methodsmentioning
confidence: 99%
“…The size of the hidden layer of all the MLPs is determined by fixing the total number of parameters to 35% of the training data following the previous work [8]. The articulatory posteriors and phoneme posteriors are estimated from MLP trained using ICSI Quicknet software 1 .…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation