AligNART: Non-autoregressive Neural Machine Translation by Jointly Learning to Estimate Alignment and Translate

Song, Jongyoon; Kim, Sung-Won; Yoon, Sungroh

doi:10.18653/v1/2021.emnlp-main.1

Cited by 20 publications

(12 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Gu et al [16] pre-define the latent variable Z as fertility and use it to determine how many target words every source word is aligned to. Song et al [62] predict the alignment by an aligner module as the latent variable Z. Position Information of Target Tokens.…”

Section: Latent Variable-based Methodsmentioning

confidence: 99%

“…FT-NAT [16] ENAT [23] NAT-REG [22] FlowSeq [57] AXE-NAT [70] Fully-NAT [38] OAXE-NAT [39] AligNART [62] DAD [68] RefineNAT [29] Insertion Transformer [53] Levenshtein [54] JM-NAT [55] Imputer [36] Multi-Task [78] RewriteNAT [26] CMLMC [56] Fully NAT Iterative NAT Fig. 9.…”

Section: Bleu Scorementioning

confidence: 99%

“…In this section, we will conduct a brief discussion about these works. Transformer(base) [13] Transformer(12-1) [149] Semi-NAT [17] CMLM [18] RefineNAT [29] Levenshtein [54] Imputer [36] Disco [34] FT-NAT [16] FlowSeq [57] AXE-NAT [70] Reorder-NAT [61] Fully-NAT [38] JM-NAT [55] OAXE-NAT [39] Glat [37] PNAT [58] SynST [59] AligNART [62] CNAT [63] DSLP [67] RewriteNAT [26] NART-DCRF [25] LaNAT [60] DAD [68] Fig. 10.…”

Section: Extensive Applications Beyond Nmtmentioning

confidence: 99%

See 2 more Smart Citations

A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond

Xiao¹,

Wu²,

Guo³

et al. 2022

Preprint

View full text Add to dashboard Cite

Non-autoregressive (NAR) generation, which is first proposed in neural machine translation (NMT) to speed up inference, has attracted much attention in both machine learning and natural language processing communities. While NAR generation can significantly accelerate inference speed for machine translation, the speedup comes at the cost of sacrificed translation accuracy compared to its counterpart, auto-regressive (AR) generation. In recent years, many new models and algorithms have been designed/proposed to bridge the accuracy gap between NAR generation and AR generation. In this paper, we conduct a systematic survey with comparisons and discussions of various non-autoregressive translation (NAT) models from different aspects. Specifically, we categorize the efforts of NAT into several groups, including data manipulation, modeling methods, training criterion, decoding algorithms, and the benefit from pre-trained models. Furthermore, we briefly review other applications of NAR models beyond machine translation, such as dialogue generation, text summarization, grammar error correction, semantic parsing, speech synthesis, and automatic speech recognition. In addition, we also discuss potential directions for future exploration, including releasing the dependency of KD, dynamic length prediction, pre-training for NAR, and wider applications, etc. We hope this survey can help researchers capture the latest progress in NAR generation, inspire the design of advanced NAR models and algorithms, and enable industry practitioners to choose appropriate solutions for their applications. The web page of this survey is at https://github.com/LitterBrother-Xiao/Overview-of-Non-autoregressive-Applications.

show abstract

Section: Latent Variable-based Methodsmentioning

confidence: 99%

Section: Bleu Scorementioning

confidence: 99%

Section: Extensive Applications Beyond Nmtmentioning

confidence: 99%

See 1 more Smart Citation

A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond

Xiao¹,

Wu²,

Guo³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Non-autoregressive Decoding To address the inefficiency of autoregressive decoding for seq2seq generation, Gu et al (2018) first proposed non-autoregressive decoding for Machine Translation, which decodes the output sentence in one single iteration despite translation quality loss. Recent work mainly focused on improving the quality while maintaining competitive speedups, including applying various training objectives (Ghazvininejad et al, 2020a;Saharia et al, 2020;Du et al, 2021;, modeling dependencies between target tokens (Ghazvininejad et al, 2019;Qian et al, 2021;Song et al, 2021;Gu & Kong, 2021) and refining the translation outputs with multi-pass iterations (Ghazvininejad et al, 2020b;Kasai et al, 2020;Geng et al, 2021;Savinov et al, 2021;Huang et al, 2022). However, due to the inherent conditional independence assumption, non-autoregressive decoding's quality is generally less reliable than the autoregressive counterpart.…”

Section: Related Workmentioning

confidence: 99%

Lossless Acceleration for Seq2seq Generation with Aggressive Decoding

Ge¹,

Xia²,

Sun³

et al. 2022

Preprint

View full text Add to dashboard Cite

We study lossless acceleration for seq2seq generation with a novel decoding algorithm -Aggressive Decoding. Unlike the previous efforts (e.g., non-autoregressive decoding) speeding up seq2seq generation at the cost of quality loss, our approach aims to yield the identical (or better) generation compared with autoregressive decoding but in a significant speedup, achieved by innovative cooperation of aggressive decoding and verification that are both highly efficient due to parallel computing. We propose two Aggressive Decoding paradigms for two kinds of seq2seq tasks: 1) For the seq2seq tasks whose inputs and outputs are highly similar (e.g., Grammatical Error Correction), we propose Input-guided Aggressive Decoding that aggressively copies from the input sentence as drafted decoded tokens to verify in parallel; 2) For other general seq2seq tasks (e.g., Machine Translation), we propose Generalized Aggressive Decoding that first employs an additional nonautoregressive decoding model for aggressive decoding and then verifies in parallel in the autoregressive manner. We test Aggressive Decoding on the most popular 6-layer Transformer model on GPU in multiple seq2seq tasks: 1) For Input-guided Aggressive Decoding, we show that it can introduce a 7×∼9× speedup for the Transformer in Grammatical Error Correction and Text Simplification tasks with the identical results as greedy decoding; 2) For Generalized Aggressive Decoding, we observe a 3×∼5× speedup with the identical or even better quality in two important seq2seq tasks: Machine Translation and Abstractive Summarization. Moreover, Aggressive Decoding can benefit even more from stronger computing devices that are better at parallel computing. Given the lossless quality as well as significant and promising speedup, we believe Aggressive Decoding may potentially evolve into a de facto standard for efficient and lossless seq2seq generation in the near future. Our codes are available at https://github.com/microsoft/unilm/tree/ master/decoding.

show abstract

“…Here, the predicted full-sentence length can be considered as a latent variable during translating, aiming to help model the complex sequential dependency between incomplete source words, where introducing latent variable has been proven to provide effective help for modeling sequential dependency (Lee et al, 2018;Su et al, 2018;Shu et al, 2020;Song et al, 2021). Owing to the full-sentence length as the latent variable, the model has a stronger ability to model the sequential dependency, thereby reducing position bias.…”

Section: A Theoretical Analysis Of Position Bias In Simtmentioning

confidence: 99%

Reducing Position Bias in Simultaneous Machine Translation with Length-Aware Framework

Zhang¹,

Feng²

2022

Preprint

View full text Add to dashboard Cite

Simultaneous machine translation (SiMT) starts translating while receiving the streaming source inputs, and hence the source sentence is always incomplete during translating. Different from the full-sentence MT using the conventional seq-to-seq architecture, SiMT often applies prefix-to-prefix architecture, which forces each target word to only align with a partial source prefix to adapt to the incomplete source in streaming inputs. However, the source words in the front positions are always illusoryly considered more important since they appear in more prefixes, resulting in position bias, which makes the model pay more attention on the front source positions in testing. In this paper, we first analyze the phenomenon of position bias in SiMT, and develop a Length-Aware Framework to reduce the position bias by bridging the structural gap between SiMT and full-sentence MT. Specifically, given the streaming inputs, we first predict the full-sentence length and then fill the future source position with positional encoding, thereby turning the streaming inputs into a pseudo full-sentence. The proposed framework can be integrated into most existing SiMT methods to further improve performance. Experiments on two representative SiMT methods, including the state-of-the-art adaptive policy, show that our method successfully reduces the position bias and thereby achieves better SiMT performance.

show abstract

AligNART: Non-autoregressive Neural Machine Translation by Jointly Learning to Estimate Alignment and Translate

Cited by 20 publications

References 25 publications

A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond

A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond

Lossless Acceleration for Seq2seq Generation with Aggressive Decoding

Reducing Position Bias in Simultaneous Machine Translation with Length-Aware Framework

Contact Info

Product

Resources

About