1988
DOI: 10.1147/rd.322.0227
|View full text |Cite
|
Sign up to set email alerts
|

Multilevel decoding for Very-Large-Size-Dictionary speech recognition

Abstract: An important concern in the field of speech recognition is the size of the vocabulary that a recognition system is able to support. Large vocabularies introduce difficulties involving the amount of computation the system must perform and the number of ambiguities it must resolve. But, for practical applications in general and for dictation tasks in particular, large vocabularies are required, because of the difficulties and inconveniences involved in restricting the speaker to the use of a limited vocabulary. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

1989
1989
2016
2016

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 21 publications
(6 citation statements)
references
References 18 publications
0
6
0
Order By: Relevance
“…Both phonetic and triphone models were trained using the forward-backward algorithm in the usual manner; the triphone statistics were smoothed back onto the underlying phonetic models via deleted estimation. The topology of the triphone and and phonetic models were seven-state models with independent distributions for the beginning, middle, a.nd end of each phone as described in [12]. Results are shown in the fourth column of Table 1.…”
Section: Resultsmentioning
confidence: 99%
“…Both phonetic and triphone models were trained using the forward-backward algorithm in the usual manner; the triphone statistics were smoothed back onto the underlying phonetic models via deleted estimation. The topology of the triphone and and phonetic models were seven-state models with independent distributions for the beginning, middle, a.nd end of each phone as described in [12]. Results are shown in the fourth column of Table 1.…”
Section: Resultsmentioning
confidence: 99%
“…To solve the ASR task, query utterance X is automatically divided into K syllables (Janakiraman et al, 2010) and the vowel segment X(k) is extracted from the k-th syllable (Pfau and Ruske, 1998). We assume that syllables are extracted without mistakes, e.g., the voice commands are produced in isolated syllable mode (Merialdo, 1988).…”
Section: Voice Command Recognitionmentioning
confidence: 99%
“…In gener~d, each uc~le bears a time span, a label, and a score. Grids can also be used to represent an input text obtained by scanning a bad original, or a stenotypy tape [9], and to implement some working structures (like flint of the Cocke algorithm).…”
Section: • P6: Debugging Problemsmentioning
confidence: 99%