Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Com 2009
DOI: 10.3115/1620754.1620810
|View full text |Cite
|
Sign up to set email alerts
|

Assessing and improving the performance of speech recognition for incremental systems

Abstract: In incremental spoken dialogue systems, partial hypotheses about what was said are required even while the utterance is still ongoing. We define measures for evaluating the quality of incremental ASR components with respect to the relative correctness of the partial hypotheses compared to hypotheses that can optimize over the complete input, the timing of hypothesis formation relative to the portion of the input they are about, and hypothesis stability, defined as the number of times they are revised. We show … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 27 publications
(29 citation statements)
references
References 6 publications
0
29
0
Order By: Relevance
“…Our corpus hence totals 1687 utterances, with an average of 5.43 words per utterance (sd 2.36), and a vocabulary of 237 distinct words. We performed the experiments reported below both with manual transcriptions of the utterances as well as with asr transcriptions (for which we used the version of Sphinx4 described in Baumann et al, 2009, with models fine-tuned to this domain, achieving a word error rate of 0.24).…”
Section: Data and Taskmentioning
confidence: 99%
“…Our corpus hence totals 1687 utterances, with an average of 5.43 words per utterance (sd 2.36), and a vocabulary of 237 distinct words. We performed the experiments reported below both with manual transcriptions of the utterances as well as with asr transcriptions (for which we used the version of Sphinx4 described in Baumann et al, 2009, with models fine-tuned to this domain, achieving a word error rate of 0.24).…”
Section: Data and Taskmentioning
confidence: 99%
“…In [6], the points at which partial hypotheses are computed are carefully selected to be at times when the ASR either has high confidence in the current word or the language model end of utterance symbol has been reached. In [8], additional right context is included before a partial hypothesis is returned, which introduces a short lag but improves stability.…”
Section: Incremental Dialoguementioning
confidence: 99%
“…Fink et al [7] found that providing more right context (i.e., more acoustic information) could improve accuracy. Likewise, Baumann et al [4] showed that increasing the language model weight of words in the lattice could improve accuracy. Selfridge et al [25] took both of these ideas further and proposed an algorithm that looked for paths in the lattice that either terminated in an end-of-sentence (as deemed by the language model), or converged to a single node.…”
Section: Related Workmentioning
confidence: 99%