2015
DOI: 10.1109/taslp.2015.2409733
|View full text |Cite
|
Sign up to set email alerts
|

Maximum F1-Score Discriminative Training Criterion for Automatic Mispronunciation Detection

Abstract: We carry out an in-depth investigation on a newly proposed Maximum F1-score Criterion (MFC) discriminative training objective function for Goodness of Pronunciation (GOP) based automatic mispronunciation detection that makes use of Gaussian Mixture Model-hidden Markov model (GMM-HMM) as acoustic models. The formulation of MFC seeks to directly optimize F1-score by converting the non-differentiable F1-score function into a continuous objective function to facilitate optimization. We present model-space training… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
45
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 108 publications
(46 citation statements)
references
References 32 publications
1
45
0
Order By: Relevance
“…In this paper, we explore to learn the DNN-HMM based acoustic models, as well as the decision function, with a discriminative objective that is directly linked to the ultimate evaluation metric of mispronunciation detection. Here we take the F1-score for investigation, since it was frequently adopted as the evaluation metric in previous work on mispronunciation detection [22][23][24]. Further, in this paper, the parameters of the decision function is set to be either phone-or senonedependent when the phone-level (cf.…”
Section: Maximum F1-score Criterion Trainingmentioning
confidence: 99%
See 1 more Smart Citation
“…In this paper, we explore to learn the DNN-HMM based acoustic models, as well as the decision function, with a discriminative objective that is directly linked to the ultimate evaluation metric of mispronunciation detection. Here we take the F1-score for investigation, since it was frequently adopted as the evaluation metric in previous work on mispronunciation detection [22][23][24]. Further, in this paper, the parameters of the decision function is set to be either phone-or senonedependent when the phone-level (cf.…”
Section: Maximum F1-score Criterion Trainingmentioning
confidence: 99%
“…Yet there still are a wide array of studies that capitalize on various acoustic and prosodic cues, confidence measures and speaking-style information, to name just a few, for use in mispronunciation detection. Interested readers may also refer to [13][14][15][16][17] for comprehensive and enjoyable overviews of state-of-the-art methods that have been successfully developed and applied to various mispronunciation detection tasks.…”
Section: Introductionmentioning
confidence: 99%
“…Precision and Recall are measurements originated from Information Recovery and used in Classification when working with non-balanced classes. Precision is the percentage of instances that were correctly classified as positive among all of the data that were classified as positive, while Recall is the percentage of instances that were correctly classified as positive among the ones that really were positive, and F1-score is the harmonic mean between precision and recall [26]. The advantage of the F1-score is that it offers only one quality metric, facilitating a better understanding for end users.…”
Section: Assessment Of Classification Resultsconfusion Matrixmentioning
confidence: 99%
“…The number of selected features was constant: N = 100. In order to compare our method with the old one, we used F 1 score [11] of SVM classifier.…”
Section: Methodsmentioning
confidence: 99%