2021
DOI: 10.48550/arxiv.2109.07327
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Improving Streaming Transformer Based ASR Under a Framework of Self-supervised Learning

Abstract: Recently self-supervised learning has emerged as an effective approach to improve the performance of automatic speech recognition (ASR). Under such a framework, the neural network is usually pre-trained with massive unlabeled data and then fine-tuned with limited labeled data. However, the nonstreaming architecture like bidirectional transformer is usually adopted by the neural network to achieve competitive results, which can not be used in streaming scenarios. In this paper, we mainly focus on improving the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 24 publications
(37 reference statements)
0
0
0
Order By: Relevance
“…The idea of training an ASR system based on its own transcription is closely related to the widely applied pseudo-labeling method in the field of ASR, which involves using a model’s own transcription of degraded speech as a supervision signal to adapt the model. 10 , 11 , 12 , 13 Recent advances in DNN-based ASR systems have offered a potential tool to investigate the computational strategy underlying speech recognition behavior, as these systems have reached human-level speech recognition performance in many scenarios. 14 , 15 Therefore, despite dramatic differences in the implementation of ASR systems and the human brain, we utilize the ASR system to probe the computational-level principle behind the rapid human adaptation to acoustically degraded speech.…”
Section: Introductionmentioning
confidence: 99%
“…The idea of training an ASR system based on its own transcription is closely related to the widely applied pseudo-labeling method in the field of ASR, which involves using a model’s own transcription of degraded speech as a supervision signal to adapt the model. 10 , 11 , 12 , 13 Recent advances in DNN-based ASR systems have offered a potential tool to investigate the computational strategy underlying speech recognition behavior, as these systems have reached human-level speech recognition performance in many scenarios. 14 , 15 Therefore, despite dramatic differences in the implementation of ASR systems and the human brain, we utilize the ASR system to probe the computational-level principle behind the rapid human adaptation to acoustically degraded speech.…”
Section: Introductionmentioning
confidence: 99%