1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings
DOI: 10.1109/asru.1997.659007
|View full text |Cite
|
Sign up to set email alerts
|

Syllable-a promising recognition unit for LVCSR

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Publication Types

Select...
4
2
1

Relationship

3
4

Authors

Journals

citations
Cited by 16 publications
(6 citation statements)
references
References 3 publications
0
6
0
Order By: Relevance
“…Four passes of Baum-Welch reestimation were used to reestimate the triphone model parameters. The number of Gaussians was, however, reduced by tying states [20]. Finally, these models were increased to eight Gaussians per state using a standard divide-by-two clustering algorithm.…”
Section: B Baseline Triphone Systemmentioning
confidence: 99%
“…Four passes of Baum-Welch reestimation were used to reestimate the triphone model parameters. The number of Gaussians was, however, reduced by tying states [20]. Finally, these models were increased to eight Gaussians per state using a standard divide-by-two clustering algorithm.…”
Section: B Baseline Triphone Systemmentioning
confidence: 99%
“…Syllables as a basic recognition unit were again analyzed by a team of researchers in Ganapathiraju et al [1997]. This work created a set of models for the 200 most common monosyllabic words, and a second set of syllable models for the syllables in the remaining monosyllabic words and multiple syllable words.…”
Section: Syllables As a Recognition Unitmentioning
confidence: 99%
“…A plausible solution to these constraints was to select the male speaker subset [6] from the WS'97 DevTest, and to reserve 10 utterances from each test speaker for adaptation. This resulted in 1241 utterances consisting o f 2 3 s p e a k e r s , 2 4 c o n v e r s a t i o n s i d e s , a n d approximately 50 minutes of speech.…”
Section: N-best Rescoring Experimentsmentioning
confidence: 99%
“…This resulted in 1241 utterances consisting o f 2 3 s p e a k e r s , 2 4 c o n v e r s a t i o n s i d e s , a n d approximately 50 minutes of speech. We used a baseline context-dependent phone HMM system [6] to generate N-best lists and time alignments for the reference transcription and the 100-best hypotheses. The HDM systems rescored these hypotheses, and the resulting sentence hypotheses were scored using standard NIST scoring software and presented in terms of word error rate (WER).…”
Section: N-best Rescoring Experimentsmentioning
confidence: 99%