Automatic pronunciation scoring of words and sentences independent from the non-native’s first language

Cincarek, Tobias; Gruhn, Rainer; Hacker, Christian; Nöth, Elmar; Nakamura, Satoshi

doi:10.1016/j.csl.2008.03.001

Cited by 63 publications

(36 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…3 performance measures are used [3], they are: (1) the correlation coefficient (COR) we mentioned in Equation 1; (2) the class-wise average recognition rate (CL), which is the accuracy of the sentences been classified correctly; (3) the average recognition rate tolerating ±1 neighbor classes (CL-1A).…”

Section: Resultsmentioning

confidence: 99%

“…Many word and sentence level features have been proposed [5] [6] and some of them are derived from the integration of phone level features. Many features achieve good results with foreign language learners [3] [4]. However, our works focus on the native pronunciation evaluation system which is used to test the mandarin Putonghua pronunciation proficiency of Chinese dialectal speakers.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Text-independent pronunciation evaluation based on phone-level Gaussian classifier

Geng

Miao

2010

IEEE 10th INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS

View full text Add to dashboard Cite

This paper presents a novel text-independent pronunciation evaluation method. The method aims at testing the mandarin Putonghua pronunciation proficiency of the dialectal speakers from every area of China. In the proposed method, 5 pronunciation proficiency levels are assigned for each phone; accordingly, 5 Single Gaussian classifiers are created to represent the 5 levels for a phone. 4 phone-level features are described and used in the classifiers. In the testing step, phone-level output probabilities are combined to get the sentence-level likelihood.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Text-independent pronunciation evaluation based on phone-level Gaussian classifier

Geng

Miao

2010

IEEE 10th INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS

View full text Add to dashboard Cite

show abstract

“…The speech rate estimation typically involves identification of the syllable nuclei locations followed by syllable rate computation ( Reddy et al, 2013 ). Generally the approaches for the speech rate estimation and the syllable nuclei detection are based on either acoustic features ( Heinrich and Schiel, 2011;Morgan et al, 1997;Reddy et al, 2013;Wang and Narayanan, 2007 ) or hidden Markov model (HMM) based recognition systems ( Cincarek et al, 2009;Cucchiarini et al, 2000;Hönig et al, 2012;Yuan and Liberman, 2010 ).…”

Section: Introductionmentioning

confidence: 99%

A mode-shape classification technique for robust speech rate estimation and syllable nuclei detection

Yarra

Deshmukh²,

Ghosh

2016

Speech Communication

View full text Add to dashboard Cite

Acoustic feature based speech (syllable) rate estimation and syllable nuclei detection are important problems in automatic speech recognition (ASR), computer assisted language learning (CALL) and fluency analysis. A typical solution for both the problems consists of two stages. The first stage involves computing a short-time feature contour such that most of the peaks of the contour correspond to the syllabic nuclei. In the second stage, the peaks corresponding to the syllable nuclei are detected. In this work, instead of the peak detection, we perform a mode-shape classification, which is formulated as a supervised binary classification problem -mode-shapes representing the syllabic nuclei as one class and remaining as the other. We use the temporal correlation and selected sub-band correlation (TCSSBC) feature contour and the mode-shapes in the TCSSBC feature contour are converted into a set of feature vectors using an interpolation technique. A support vector machine classifier is used for the classification. Experiments are performed separately using Switchboard, TIMIT and CTIMIT corpora in a five-fold cross validation setup. The average correlation coefficients for the syllable rate estimation turn out to be 0.6761, 0.6928 and 0.3604 for three corpora respectively, which outperform those obtained by the best of the existing peak detection techniques. Similarly, the average F -scores (syllable level) for the syllable nuclei detection are 0.8917, 0.8200 and 0.7637 for three corpora respectively.

show abstract

“…휴지의 길이 등과 같은 다양한 음향 특질을 추출하고 이를 조합 하여 전역 점수를 계산할 수 있다 (Cucchiarini et al, 2000a(Cucchiarini et al, , 2000b(Cucchiarini et al, , 2002Cincarek et al, 2009;Zechner et al, 2009). 그 외에도 점수 계산을 위한 자질로서, 원어민 화자의 음향 모델로부터 로그 사 후확률 점수와 분절음의 지속 시간 점수를 사용하기도 한다 (Franco et al, 1997;Neumeyer et al, 2000).…”

unclassified

Automatic pronunciation assessment of English produced by Korean learners using articulatory features

Ryu¹,

Chung²

2016

Phonetics and Speech Sciences

View full text Add to dashboard Cite

Automatic pronunciation scoring of words and sentences independent from the non-native’s first language

Cited by 63 publications

References 14 publications

Text-independent pronunciation evaluation based on phone-level Gaussian classifier

Text-independent pronunciation evaluation based on phone-level Gaussian classifier

A mode-shape classification technique for robust speech rate estimation and syllable nuclei detection

Automatic pronunciation assessment of English produced by Korean learners using articulatory features

Contact Info

Product

Resources

About