2000
DOI: 10.1016/s0020-0255(99)00106-1
|View full text |Cite
|
Sign up to set email alerts
|

Hybrid HMM–NN modeling of stationary–transitional units for continuous speech recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2004
2004
2014
2014

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 24 publications
(15 citation statements)
references
References 8 publications
0
15
0
Order By: Relevance
“…In the experiments reported in this paper, a single phonetic decoder has been used, i.e. the Loquendo-ASR recognizer for Italian language [15], which generates a lattice of hypotheses. We estimate n-gram statistics up to the third order from these hypotheses leading to a 44135-dimensional feature space.…”
Section: A Phonetic Modelsmentioning
confidence: 99%
“…In the experiments reported in this paper, a single phonetic decoder has been used, i.e. the Loquendo-ASR recognizer for Italian language [15], which generates a lattice of hypotheses. We estimate n-gram statistics up to the third order from these hypotheses leading to a 44135-dimensional feature space.…”
Section: A Phonetic Modelsmentioning
confidence: 99%
“…NNs are commonly applied to voice and language related problems, for example McNamara et al (1998) and Albesano et al (2000). Speech recognition requires the ability to deal with temporal sequences, for which a variety of methods have been applied.…”
Section: Neural Networkmentioning
confidence: 99%
“…The Loquendo-ASR system uses acoustic models based on a hybrid combination of Hidden Markov Models (HMM) and Multi Layer Perceptron (MLP), where each phonetic unit is described in terms of a single or double state left-to-right automaton with self-loops, the HMM transition probabilities are uniform and fixed, and the emission probabilities are computed by an MLP [16]. This MLP has an input layer of 273 units, a first hidden layer of 315 units, a second hidden layer of 315 units and an output layer including a variable number of units that is language dependent (600 to 1000).…”
Section: A the Ann Architecturementioning
confidence: 99%
“…The acoustic models are based on a set of vocabulary and gender independent units including stationary contextindependent phones and diphone-transition coarticulation models [16]. These acoustic models have been successfully used for the 15 languages released with the Loquendo ASR recognizer, and are the seed models for the adaptation experiments of Section V, if not differently specified.…”
Section: A the Ann Architecturementioning
confidence: 99%