Non-Autoregressive ASR with Self-Conditioned Folded Encoders

Komatsu, Tatsuya

doi:10.48550/arxiv.2202.08474

Cited by 2 publications

(2 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Despite the promising performance, many works show the over-parameterization of transformers [8,9], which leads the models to require much memory storage during training and inference, and hence limits the usage of the models on-device. To reduce the memory cost, some works share the parameters of one or several transformer blocks so that the total number of the parameters of the model is much reduced [9,10,11,12,13]. These models use one or a few transformer blocks to encode features in a recursive manner, thus the number of parameters is less than the original transformers with the same depth.…”

Section: Introductionmentioning

confidence: 99%

Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition

Bai¹,

Li²,

Han³

et al. 2022

Interspeech 2022

View full text Add to dashboard Cite

While transformers and their variant conformers show promising performance in speech recognition, the parameterized property leads to much memory cost during training and inference. Some works use cross-layer weight-sharing to reduce the parameters of the model. However, the inevitable loss of capacity harms the model performance. To address this issue, this paper proposes a parameter-efficient conformer via sharing sparsely-gated experts. Specifically, we use sparsely-gated mixture-of-experts (MoE) to extend the capacity of a conformer block without increasing computation. Then, the parameters of the grouped conformer blocks are shared so that the number of parameters is reduced. Next, to ensure the shared blocks with the flexibility of adapting representations at different levels, we design the MoE routers and normalization individually. Moreover, we use knowledge distillation to further improve the performance. Experimental results show that the proposed model achieves competitive performance with 1/3 of the parameters of the encoder, compared with the full-parameter model.

show abstract

Section: Introductionmentioning

confidence: 99%

Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition

Bai¹,

Li²,

Han³

et al. 2022

Interspeech 2022

View full text Add to dashboard Cite

show abstract

“…https://arxiv.org/pdf/2201.10103v2.pdf --S-CFE CTC [159] https://arxiv.org/pdf/2202.08474v1.pdf --CASSNAT [119] https://arxiv.org/pdf/2010.14725v2.pdf --DLP [120] https://arxiv.org/pdf/2010.13270.pdf --CTC-enhanced [104] https://arxiv.org/pdf/2010.15025 --Align-Refine [111] https://aclanthology.org/2021.naacl-main.154.pdf https://github.com/amazon-research/align-refine To be released Align-Denoise [112] http://dx.doi.org/10.21437/Interspeech.2021-1906 https://github.com/bobchennan/espnet/tree Pytorch/Espnet LASO-BERT [121] https://arxiv.org/pdf/2102.07594 --P2M [160] https://arxiv.org/pdf/2104.02258 --Pre-train Comformer [161] https://arxiv.org/pdf/2104.03416v4.pdf --WNARS [162] https://arxiv.org/pdf/2104.03587v2.pdf --Improved CASS-NAT [163] https://arxiv.org/pdf/2106.09885v2.pdf --NAT-UBD [164] https://arxiv.org/pdf/2109.06684v1.pdf --Conformer-CIF [165] https://arxiv.org/pdf/2104.04702 --NAR-BERT-ASR [103] https://arxiv.org/pdf/2104.04805v1.pdf --Conditional-Multispk [166] https://arxiv.org/pdf/2106.08595v1.pdf https://github.com/pengchengguo/espnet Pytorch/Espnet Streaming NAR [167] https://arxiv.org/pdf/2107.09428v1.pdf https://github.com/espnet/espnet Pytorch/Espnet A-FMLM [168] https://arxiv.org/pdf/1911.04908.pdf --Mask-CTC [115] https://arxiv.org/pdf/2005.08700.pdf https://github.com/espnet/espnet Pytorch/Espnet KERMIT [169] https://arxiv.org/pdf/2005.13211.pdf https://github.com/espnet/espnet Pytorch/Espnet LSCO [170] https://arxiv.org/pdf/2005.04862v4.pdf --Spike-Triggered [171] https://arxiv.org/pdf/2005.07903v1.pdf --Intermediate CTC [116] https://arxiv.org/pdf/2102.03216v1.pdf https://github.com/espnet/espnet Pytorch/Espnet Self-Conditioned CTC [117] https://arxiv.org/pdf/2104.02724.pdf https://github.com/espnet/espnet Pytorch/Espnet Text to Speech BVAE-TTS [130] https://openreview.net/pdf?id=o3iritJHLfO https://github.com/LEEYOONHYUNG/BVAE-TTS Pytorch vTTS [172] https://arxiv.org/pdf/2203.14725.pdf --Gan-TTS [134] https://arxiv.org/pdf/2203.01080.pdf https://github.com/yanggeng1995/GAN-TTS Pytorch VARA-TTS [129] http...…”

mentioning

confidence: 99%

A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond

Xiao¹,

Wu²,

Guo³

et al. 2022

Preprint

View full text Add to dashboard Cite

Non-autoregressive (NAR) generation, which is first proposed in neural machine translation (NMT) to speed up inference, has attracted much attention in both machine learning and natural language processing communities. While NAR generation can significantly accelerate inference speed for machine translation, the speedup comes at the cost of sacrificed translation accuracy compared to its counterpart, auto-regressive (AR) generation. In recent years, many new models and algorithms have been designed/proposed to bridge the accuracy gap between NAR generation and AR generation. In this paper, we conduct a systematic survey with comparisons and discussions of various non-autoregressive translation (NAT) models from different aspects. Specifically, we categorize the efforts of NAT into several groups, including data manipulation, modeling methods, training criterion, decoding algorithms, and the benefit from pre-trained models. Furthermore, we briefly review other applications of NAR models beyond machine translation, such as dialogue generation, text summarization, grammar error correction, semantic parsing, speech synthesis, and automatic speech recognition. In addition, we also discuss potential directions for future exploration, including releasing the dependency of KD, dynamic length prediction, pre-training for NAR, and wider applications, etc. We hope this survey can help researchers capture the latest progress in NAR generation, inspire the design of advanced NAR models and algorithms, and enable industry practitioners to choose appropriate solutions for their applications. The web page of this survey is at https://github.com/LitterBrother-Xiao/Overview-of-Non-autoregressive-Applications.

show abstract

Non-Autoregressive ASR with Self-Conditioned Folded Encoders

Cited by 2 publications

References 17 publications

Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition

Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition

A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond

Contact Info

Product

Resources

About