2024
DOI: 10.1007/s11063-024-11614-z
|View full text |Cite
|
Sign up to set email alerts
|

Multi-view Self-supervised Learning and Multi-scale Feature Fusion for Automatic Speech Recognition

Jingyu Zhao,
Ruwei Li,
Maocun Tian
et al.

Abstract: To address the challenges of the poor representation capability and low data utilization rate of end-to-end speech recognition models in deep learning, this study proposes an end-to-end speech recognition model based on multi-scale feature fusion and multi-view self-supervised learning (MM-ASR). It adopts a multi-task learning paradigm for training. The proposed method emphasizes the importance of inter-layer information within shared encoders, aiming to enhance the model’s characterization capability via the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
references
References 48 publications
0
0
0
Order By: Relevance