2021
DOI: 10.48550/arxiv.2111.01690
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Recent Advances in End-to-End Automatic Speech Recognition

Abstract: Recently, the speech community is seeing a significant trend of moving from deep neural network based hybrid modeling to end-to-end (E2E) modeling for automatic speech recognition (ASR). While E2E models achieve the state-of-the-art results in most benchmarks in terms of ASR accuracy, hybrid models are still used in a large proportion of commercial ASR systems at the current time. There are lots of practical factors that affect the production model deployment decision. Traditional hybrid models, being optimize… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
2

Relationship

2
6

Authors

Journals

citations
Cited by 14 publications
(11 citation statements)
references
References 235 publications
(307 reference statements)
0
11
0
Order By: Relevance
“…Jinyu Li [28] gave a detailed overview of E2E models and feasible technologies that makes E2E models to outperform hybrid models in the industry world.…”
Section: B Deep Learning Based Methods For Automatic Speech Recogniti...mentioning
confidence: 99%
“…Jinyu Li [28] gave a detailed overview of E2E models and feasible technologies that makes E2E models to outperform hybrid models in the industry world.…”
Section: B Deep Learning Based Methods For Automatic Speech Recogniti...mentioning
confidence: 99%
“…ASR [16]. Based on HuBERT encoder, our proposed Speech2C model can also pre-train a Transformer decoder with pseudo label from the clustering model.…”
Section: Related Workmentioning
confidence: 99%
“…With the development of deep learning, end-to-end neural approaches have rapidly gained prominence in the speech recognition community [25]. However, ASR in complicated scenarios such as meetings is still not a solved problem with challenges including complex acoustic conditions, unknown number of speakers and overlapping speech.…”
Section: Related Workmentioning
confidence: 99%