2010
DOI: 10.1016/j.specom.2009.10.001
|View full text |Cite
|
Sign up to set email alerts
|

Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates

Abstract: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. ACCEPTED MANUSCRIPT AbstractDespite years of spee… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

6
97
3
14

Year Published

2011
2011
2022
2022

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 125 publications
(120 citation statements)
references
References 27 publications
6
97
3
14
Order By: Relevance
“…Thus, we restrict the vocabulary to 1/10th of the original size in order to obtain reasonable performance from the isolated ASR system. following Goldwater et al (2010), we use a logistic regression model implemented using the glm function in R (R Development Core Team, 2005). The logistic regression model fits the log-odds of a binary response variable with a linear combination of one or more predictor variables.…”
Section: Results and Discussion: In Order To Individually Analyze Eacmentioning
confidence: 99%
See 2 more Smart Citations
“…Thus, we restrict the vocabulary to 1/10th of the original size in order to obtain reasonable performance from the isolated ASR system. following Goldwater et al (2010), we use a logistic regression model implemented using the glm function in R (R Development Core Team, 2005). The logistic regression model fits the log-odds of a binary response variable with a linear combination of one or more predictor variables.…”
Section: Results and Discussion: In Order To Individually Analyze Eacmentioning
confidence: 99%
“…2 Goldwater et al (2010) also consider the number of homophones (words that share a pronunciation with the target word) and frequency-weighted homophones as additional neighborhood measures. In our data there is insufficient homophony for these measures to be significant, so we do not report on experiments using them.…”
Section: Proposed Neighborhood Measuresmentioning
confidence: 99%
See 1 more Smart Citation
“…The study reported in (Zhang and Rudnicky, 2001) included acoustic features, language model features, word lattice features, N-best features, and parser-based features derived from the language model features and the grammar (parsing-mode and slot-backoff-mode) as input features for three different post-classifiers (DT, neural network and support vector machine (SVM)) in an LVCSR system. Recent work (Goldwater et al, 2009) has proposed disfluency-based features, speaker sex, broad class-based features, turn boundary-based features, language model-based features, pronunciation-based features (word length, number of pronunciations, number of homophones, number of neighbors, and frequency-weighted homophones/neighbors), prosodic features (pitch, intensity, speech rate, duration and log jitter) and concluded that extreme prosodic values, words following a speaker turn and preceding disfluent interruption contribute most to a high word error rate (WER). To the best of our knowledge, there has not been such a systematic analysis of relevant features for STD.…”
Section: Feature Collectionmentioning
confidence: 99%
“…Some of the features are motivated by the relevance analysis on ASR in (Goldwater et al, 2009) and others were selected due to properties of STD. Table 2 summarizes the features, where each feature is assigned a number to assist the presentation in the following sections.…”
Section: Features In Analysismentioning
confidence: 99%