2005
DOI: 10.1109/tsa.2004.834466
|View full text |Cite
|
Sign up to set email alerts
|

Comparison and combination of features in a hybrid HMM/MLP and a HMM/GMM speech recognition system

Abstract: Recently, the advantages of the spectral parameters obtained by frequency filtering (FF) of the logarithmic filter-bank energies (logFBEs) have been reported. These parameters, which are frequency derivatives of the logFBEs, lie in the frequency domain, and have shown good recognition performance with respect to the conventional MFCCs for HMM systems. In this paper, the FF features are first compared with the MFCCs and the Rasta-PLP features using both a hybrid HMM/MLP and a usual HMM/GMM recognition system, f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2012
2012
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 33 publications
(16 citation statements)
references
References 14 publications
0
16
0
Order By: Relevance
“…These probabilities are called the model parameters and can be estimated effectively by using Baum Welch algorithm (Juang & Rabiner 1991). A HMM structure (Pujol, et al, 2005) can be expressed in a matrix form A=[a ij ] when a ij =1, there exists a transition from state i to state j and when a ij =0, the transition does not exist. An HMM is a finite-state machine that changes state once every time unit.…”
Section: Development Of Training Modelsmentioning
confidence: 99%
“…These probabilities are called the model parameters and can be estimated effectively by using Baum Welch algorithm (Juang & Rabiner 1991). A HMM structure (Pujol, et al, 2005) can be expressed in a matrix form A=[a ij ] when a ij =1, there exists a transition from state i to state j and when a ij =0, the transition does not exist. An HMM is a finite-state machine that changes state once every time unit.…”
Section: Development Of Training Modelsmentioning
confidence: 99%
“…From the parameterization point of view, features do not need to be uncorrelated because the network learns the local correlation between its input units. This has been used to include alternative features such as spectro-temporal parameters obtained by frequency filtering (FF) [4] or linear prediction [24], [25], or speech production knowledge in the form of articulatory features which led to more robust systems [26]- [28]. Most noteworthy, the possibility of augmenting the time-span in the feature extraction procedure together with the various methods available for combining these features (multistream, concatenation, probabilistic, etc.)…”
Section: A Motivationmentioning
confidence: 99%
“…However, the HMM-based ASR systems seem to be close to reaching their limit of performance. Hybrid systems based on a combination of artificial neural networks (ANNs) and HMMs, referred to as hybrid ANN/HMM [1]- [3], provide significant performance improvements in noisy conditions [4], [5]. However, progress on this paradigm has been hindered by their training computational requirements, A. I. García-Moral is with Fonetic Solutions S. L., Madrid -28037, Spain (e-mail: ana.garcia@fonetic.es).…”
Section: Introductionmentioning
confidence: 99%
“…That is, a MLP trained to perform classification is a class-conditional posterior probability estimator. If we associate each output neuron to a determined class C i , then, the MLP output value, which is given an input X, will be an estimate of the posterior probability P(C i /X) of the corresponding class C i given the input [24].…”
Section: Hybrid Recognition Systemmentioning
confidence: 99%
“…Artificial Neural Networks (ANN) and more specifically multilayer perceptrons (MLP) appeared to be a promising alternative in this respect to replace or help HMM in the classification mode. So, a number of ANN approaches have been suggested and used to improve the state of the art of ASR systems [20], [24]. The fundamental advantage of such approach is that it introduces a discriminative training [18].…”
Section: Introductionmentioning
confidence: 99%