Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Com 2009
DOI: 10.3115/1620754.1620819
|View full text |Cite
|
Sign up to set email alerts
|

Improved pronunciation features for construct-driven assessment of non-native spontaneous speech

Abstract: This paper describes research on automatic assessment of the pronunciation quality of spontaneous non-native adult speech. Since the speaking content is not known prior to the assessment, a two-stage method is developed to first recognize the speaking content based on non-native speech acoustic properties and then forced-align the recognition results with a reference acoustic model reflecting native and near-native speech properties. Features related to Hidden Markov Model likelihoods and vowel durations are e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
37
0

Year Published

2011
2011
2021
2021

Publication Types

Select...
5
4

Relationship

3
6

Authors

Journals

citations
Cited by 35 publications
(37 citation statements)
references
References 6 publications
0
37
0
Order By: Relevance
“…For example, speaking rate and pause profile have been suggested as useful features in several studies and even in commercial speech-scoring product implementations (Franco e t al., 2010; Bernstein, Moere, & Cheng, 2010). A widely used feature for verifying pronunciation is the Goodness of Pronunciation (GOP) tool and its derived features (Witt, 1999;Franco et al, 2010;Chen, Zechner, & Xi, 2009). In particular, following the feature extraction method described in Chen et al…”
Section: Speech Featuresmentioning
confidence: 99%
“…For example, speaking rate and pause profile have been suggested as useful features in several studies and even in commercial speech-scoring product implementations (Franco e t al., 2010; Bernstein, Moere, & Cheng, 2010). A widely used feature for verifying pronunciation is the Goodness of Pronunciation (GOP) tool and its derived features (Witt, 1999;Franco et al, 2010;Chen, Zechner, & Xi, 2009). In particular, following the feature extraction method described in Chen et al…”
Section: Speech Featuresmentioning
confidence: 99%
“…This is due to the fact that for most speakers different aspects of proficiency tend to be correlated. For example, more fluent speakers also achieve higher ROUGE scores (the correlation between ROUGE and pronunciation accuracy (Chen et al, 2009) is r = 0.62). As a result, a model which measures only one aspect of performance such as fluency may sometimes reach near optimal performance and adding further predictors leads to a relatively small gain.…”
Section: Metricmentioning
confidence: 99%
“…For the non-native speech corpus, the ASR based vowel duration extraction uses a two-pass method [5]: first the utterance is recognized using a non-native speech acoustic model (AM); then a native speech AM is used for forced alignment. This study also examines the performance of the vowel duration features when human transcriptions are used as input to the forced alignment instead of the ASR hypotheses; in this case, the ASR step is bypassed.…”
Section: Pronunciation Scoring 31 Vowel Duration Extraction and Normmentioning
confidence: 99%
“…In a simple approach, [4] correlated mean raw segment duration values with pronunciation scores (although this metric was also highly correlated with speaking rate). [5,6] calculated the average deviations between a non-native speaker's segments and mean segment durations trained on a native speaker corpus. In another approach, [2,6] used the average log probability of segment durations based on native speaker distributions of durations for each phone (after normalizing for rate of speech).…”
Section: Introductionmentioning
confidence: 99%