2021
DOI: 10.48550/arxiv.2110.11115
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Improving Non-autoregressive Generation with Mixup Training

Abstract: While pre-trained language models have achieved great success on various natural language understanding tasks, how to effectively leverage them into nonautoregressive generation tasks remains a challenge. To solve this problem, we present a non-autoregressive generation model based on pre-trained transformer models. To bridge the gap between autoregressive and non-autoregressive models, we propose a simple and effective iterative training method called MIx Source and pseudo Target (MIST). Unlike other iterativ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 29 publications
0
5
0
Order By: Relevance
“…Specifically, we report the unigram ROUGE-1 and bigram ROUGE-2 overlap to assess the informativeness, and the longest common subsequence ROUGE-L score to assess the fluency. We compare our AMOM with the original CMLM and several NAR baseline models, including vanilla NAT (Gu et al 2018), InsertNAR (Stern et al 2019), Levenshitein (Gu, Wang, and Zhao 2019), Disco (Kasai et al 2020a), POSPD (Yang et al 2021), CMLM (Ghazvininejad et al 2019), BANG (Qi et al 2021), MIST (Jiang et al 2021), ELMER (Li et al 2022a). Results show that AMOM outperforms all other NAR models without pre-training.…”
Section: Resultsmentioning
confidence: 99%
“…Specifically, we report the unigram ROUGE-1 and bigram ROUGE-2 overlap to assess the informativeness, and the longest common subsequence ROUGE-L score to assess the fluency. We compare our AMOM with the original CMLM and several NAR baseline models, including vanilla NAT (Gu et al 2018), InsertNAR (Stern et al 2019), Levenshitein (Gu, Wang, and Zhao 2019), Disco (Kasai et al 2020a), POSPD (Yang et al 2021), CMLM (Ghazvininejad et al 2019), BANG (Qi et al 2021), MIST (Jiang et al 2021), ELMER (Li et al 2022a). Results show that AMOM outperforms all other NAR models without pre-training.…”
Section: Resultsmentioning
confidence: 99%
“…The inference efficiency is not only required for neural machine translation but also indispensable for many other text generation tasks [81], [100], [101]. Existing works of introducing NAR techniques into text generation tasks focus on automatic speech recognition [102], [103], [104], text summarization [105], grammatical error correction [106], [107], dialogue [108], [109].…”
Section: Text Generationmentioning
confidence: 99%
“…to enhance auto-regressive models with powerful pretraining techniques and models, with impressive performance being achieved. However, only very few papers apply these powerful pre-trained models to help NAR models [80], [101], and there is only a preliminary exploration of the pre-training techniques for NAR models [40], [105]. Thus, it is promising to explore pretraining methods for non-autoregressive generation and other related tasks.…”
Section: Conclusion and Outlooksmentioning
confidence: 99%
“…Text generation can also be done with masked-language models, e.g. BERT [Su et al, 2021, Jiang et al, 2021. Apart from left to right text generation, estimating sentence/word probabilities from masked language models is also useful for various NLP tasks.…”
Section: Mlmmentioning
confidence: 99%