ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414058
|View full text |Cite
|
Sign up to set email alerts
|

Semi-Supervised Speech Recognition Via Graph-Based Temporal Classification

Abstract: Semi-supervised learning has demonstrated promising results in automatic speech recognition (ASR) by self-training using a seed ASR model with pseudo-labels generated for unlabeled data. The effectiveness of this approach largely relies on the pseudo-label accuracy, for which typically only the 1-best ASR hypothesis is used. However, alternative ASR hypotheses of an N -best list can provide more accurate labels for an unlabeled speech utterance and also reflect uncertainties of the seed ASR model. In this pape… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 22 publications
(10 citation statements)
references
References 35 publications
0
10
0
Order By: Relevance
“…Moreover, MPL was shown to be effective independently of the amount of unlabeled data or domain mismatch. Future work includes applying filtering techniques [31] and introducing multiple hypotheses [32] in the MPL process.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Moreover, MPL was shown to be effective independently of the amount of unlabeled data or domain mismatch. Future work includes applying filtering techniques [31] and introducing multiple hypotheses [32] in the MPL process.…”
Section: Discussionmentioning
confidence: 99%
“…We focus on self-training [21] or pseudo-labeling (PL) [22], which has recently been adopted for semi-supervised E2E ASR and shown to be effective [23][24][25][26][27][28][29][30][31][32]. In PL, a teacher (base) model is first trained on labeled data and used to generate * Research conducted during an internship at MERL pseudo-labels for unlabeled data.…”
Section: Introductionmentioning
confidence: 99%
“…In this section, we describe the extended GTC loss function. For the convenience of understanding, we mostly follow the notations in the previous GTC study [14].…”
Section: Extended Gtc (Gtc-e)mentioning
confidence: 99%
“…As in GTC [14], we can apply a beam search algorithm during decoding. Since the output of GTC-e contains tokens from multiple speakers, we need to make modifications to the existing timesynchronous prefix beam search algorithm [25,26].…”
Section: Multi-speaker Asr and Beam Searchmentioning
confidence: 99%
See 1 more Smart Citation