2022
DOI: 10.1109/jstsp.2022.3195367
|View full text |Cite
|
Sign up to set email alerts
|

Momentum Pseudo-Labeling: Semi-Supervised ASR With Continuously Improving Pseudo-Labels

Abstract: End-to-end automatic speech recognition (ASR) has become a popular alternative to traditional module-based systems, simplifying the model-building process with a single deep neural network architecture. However, the training of end-toend ASR systems is generally data-hungry: a large amount of labeled data (speech-text pairs) is necessary to learn direct speech-to-text conversion effectively. To make the training less dependent on labeled data, pseudo-labeling, a semi-supervised learning approach, has been succ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
9
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(9 citation statements)
references
References 76 publications
0
9
0
Order By: Relevance
“…transcription generated by some method. Different ways of inferring pseudo-labels PL(x; θ ) have been proposed [22,31,38,26,29,18,7], including both greedy and beam-search decoding, with or without an external LM, and with variants on the "teacher" AM model θ . IPL [38] and slimIPL [26] are continuous PL approaches, where a single AM (with parameters θ) is continuously trained.…”
Section: Acoustic (Am) and Language (Lm) Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…transcription generated by some method. Different ways of inferring pseudo-labels PL(x; θ ) have been proposed [22,31,38,26,29,18,7], including both greedy and beam-search decoding, with or without an external LM, and with variants on the "teacher" AM model θ . IPL [38] and slimIPL [26] are continuous PL approaches, where a single AM (with parameters θ) is continuously trained.…”
Section: Acoustic (Am) and Language (Lm) Modelsmentioning
confidence: 99%
“…The two dominant methods for leveraging unlabeled audio are unsupervised pre-training via selfsupervision (SSL) [6,19,11,4] and semi-supervised self-training [22,38,26,29,16,18], or pseudo-labeling (PL). In pre-training, a model is trained to process the raw unlabeled data to extract features that solve some pretext task, followed by supervised fine-tuning on some downstream ASR task.…”
Section: Introductionmentioning
confidence: 99%
“…Another technique is the use of an exponential moving average (EMA) of the acoustic model to generate the pseudo-labels in Eq. ( 2) (Likhomanenko et al, 2021a;Manohar et al, 2021;Higuchi et al, 2021Higuchi et al, , 2022bZhang et al, 2022).…”
Section: Experimental Setup and Related Methodsmentioning
confidence: 99%
“…10k 20k 40k WER 14.3 17.1 22.9 (teacher-student) Kahn et al, 2020a;, here pseudo-labels (PLs) are generated online with a very recent version of the model Likhomanenko et al, 2021a;Manohar et al, 2021;Higuchi et al, 2021Higuchi et al, , 2022a and training is faster and more resource-efficient. One of the main challenges for continuous ST is training stability (Likhomanenko et al, 2021a;Higuchi et al, 2021Higuchi et al, , 2022bCai et al, 2022). While these prior works use various techniques for stabilization, one common ingredient is that models are initially trained on labeled data for M steps.…”
mentioning
confidence: 99%
“…It utilizes labeled samples to predict the class of unlabeled samples and integrates labeled and pseudo-labeled samples to train the network. Semi-supervised learning methods based on the pseudo label have been gradually applied to automatic speech recognition [ 26 ] and image semantic segmentation [ 27 ]. To overcome the limitations of the traditional supervised SAE and to improve the generalization performance, the pseudo label-based semi-supervised stacked autoencoder (PL-SSAE) is proposed by combining the SAE with the pseudo label.…”
Section: Introductionmentioning
confidence: 99%