Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-9996
|View full text |Cite
|
Sign up to set email alerts
|

Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition

Abstract: Transformers have recently dominated the ASR field. Although able to yield good performance, they involve an autoregressive (AR) decoder to generate tokens one by one, which is computationally inefficient. To speed up inference, non-autoregressive (NAR) methods, e.g. single-step NAR, were designed, to enable parallel generation. However, due to an independence assumption within the output tokens, performance of single-step NAR is inferior to that of AR models, especially with a largescale corpus. There are two… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 29 publications
(3 citation statements)
references
References 28 publications
0
3
0
Order By: Relevance
“…Therefore, its performance on smaller datasets is inferior to that of RNNT. Paraformer, a non-autoregressive model, enhances contextual modeling via the GLM [ 44 ] sampler but underperforms compared to autoregressive models in this experiment. Paraformer (U2) also employs non-autoregressive decoding and adds CTC loss for joint optimization, improving the encoder’s feature extraction capability.…”
Section: Experiments and Analysismentioning
confidence: 98%
“…Therefore, its performance on smaller datasets is inferior to that of RNNT. Paraformer, a non-autoregressive model, enhances contextual modeling via the GLM [ 44 ] sampler but underperforms compared to autoregressive models in this experiment. Paraformer (U2) also employs non-autoregressive decoding and adds CTC loss for joint optimization, improving the encoder’s feature extraction capability.…”
Section: Experiments and Analysismentioning
confidence: 98%
“…Speech Synthesis Due to the significant difference in the pronunciation synthesis of different languages, we use dif-ferent models for different languages. For Chinese, Gao et al (Gao et al 2022) proposed a single-round nonautoregressive model Paraformer. The predictor module is used to predict the number of target characters in the speech, and the sampler transforms the acoustic feature vector and the target character vector into a feature vector containing semantic information.…”
Section: Speech Recognitionmentioning
confidence: 99%
“…Non-autoregressive speech processing is first used in [18]. After that, many more non-autoregressive methods are proposed [19][20][21][22][23][24][25]. Among the methods, there are two that are appropriate to achieve non-autoregressive spell correction.…”
Section: Introductionmentioning
confidence: 99%