Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-777
|View full text |Cite
|
Sign up to set email alerts
|

Explore wav2vec 2.0 for Mispronunciation Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
14
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 34 publications
(14 citation statements)
references
References 8 publications
0
14
0
Order By: Relevance
“…Examples include contrastive predictive coding (CPC) [16,29], auto-regressive predictive coding [30], wav2vec [31], HuBERT [32,33], wav2vec 2.0 [12,34] and Wavlm [35], with all showing promising results for a variety of different speech processing tasks. Two particularly popular approaches, HuBERT and wav2vec 2.0, have been applied to automatic speech recognition [12,13], mispronunciation detection [36,37], speaker recognition [38,39] and emotion recognition [40]. The same techniques have been explored in the context of spoofing detection [20,21].…”
Section: Related Workmentioning
confidence: 99%
“…Examples include contrastive predictive coding (CPC) [16,29], auto-regressive predictive coding [30], wav2vec [31], HuBERT [32,33], wav2vec 2.0 [12,34] and Wavlm [35], with all showing promising results for a variety of different speech processing tasks. Two particularly popular approaches, HuBERT and wav2vec 2.0, have been applied to automatic speech recognition [12,13], mispronunciation detection [36,37], speaker recognition [38,39] and emotion recognition [40]. The same techniques have been explored in the context of spoofing detection [20,21].…”
Section: Related Workmentioning
confidence: 99%
“…Goodness-of-pronunciation (GOP) is among the first DNN-based methods to MDD, which relies on phone posterior outputs from an automatic speech recognizer (ASR) [1,2,23] to evaluate phonetic errors. More recently, end-to-end phoneme recognition has been studied [5,3,4,8,24], among which [4] and [24] also explored fine-tuning Wav2vec 2.0. Our proposed method differs with them in that we investigate the usage of unlabeled target domain speech to enhance MDD performance.…”
Section: Related Workmentioning
confidence: 99%
“…The network learns speech representations from raw audio which can be used in downstream tasks such as speech recognition. Speech representation learning models such as Wav2Vec2.0 [2], and Hubert [3], have shown that learned representations produce state-of-the-art results on a variety of speech tasks: Speaker and language identification [4], emotion recognition [5], spoofing speech detection [6], and second language mispronunciation detection [7]. However, there is a lack of studies regarding the benefits of Wav2Vec or other speech representation models for impaired speech tasks like speech recognition or diagnosis.…”
Section: Introductionmentioning
confidence: 99%