Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1125
|View full text |Cite
|
Sign up to set email alerts
|

Imitation Learning for Non-Autoregressive Neural Machine Translation

Abstract: Non-autoregressive translation models (NAT) have achieved impressive inference speedup. A potential issue of the existing NAT algorithms, however, is that the decoding is conducted in parallel, without directly considering previous context. In this paper, we propose an imitation learning framework for nonautoregressive machine translation, which still enjoys the fast translation speed but gives comparable translation performance compared to its auto-regressive counterpart. We conduct experiments on the IWSLT16… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
67
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 52 publications
(67 citation statements)
references
References 23 publications
0
67
0
Order By: Relevance
“…Instead of directly copying the source embeddings to the decoder input, we use an interpolated version of the encoder outputs as the decoder input, which allows the encoder to transform the source embeddings into a more usable form. Note that a similar technique is adopted in Wei et al (2019), but our model structure and optimization are much simpler as we do not have any imitation module for detailed teacher guidance.…”
Section: Results and Analysismentioning
confidence: 99%
See 2 more Smart Citations
“…Instead of directly copying the source embeddings to the decoder input, we use an interpolated version of the encoder outputs as the decoder input, which allows the encoder to transform the source embeddings into a more usable form. Note that a similar technique is adopted in Wei et al (2019), but our model structure and optimization are much simpler as we do not have any imitation module for detailed teacher guidance.…”
Section: Results and Analysismentioning
confidence: 99%
“…These translation candidates would then be ranked by the AR teacher to select the one with the highest probability. This is referred to as length-parallel decoding in Wei et al (2019).…”
Section: Length Predictionmentioning
confidence: 99%
See 1 more Smart Citation
“…Imitation Learning Imitation learning, acquiring skills from observing demonstrations, has proven to be promising in structured prediction, such as alleviating the exposure bias problem Zhang et al, 2019b), transferring knowledge to guide non-autoregressive translation model (Gu et al, 2018;Wei et al, 2019), and automatically learning the reward of the dialogue system (Li et al, 2019b). In our work, the conventional dialogue model as a student mimics the scenariobased dialogue model on both the output layer and intermediate layers.…”
Section: Related Workmentioning
confidence: 99%
“…Non-autoregressive neural machine translation began with the work of Gu et al (2018a), who found benefit from using knowledge distillation (Hinton et al, 2015), and in particular sequence-level distilled outputs (Kim and Rush, 2016). Subsequent work has narrowed the gap between nonautoregressive and autoregressive translation, including multi-iteration refinements (Lee et al, 2018;Ghazvininejad et al, 2019;Saharia et al, 2020;Kasai et al, 2020) and rescoring with autoregressive models (Kaiser et al, 2018;Wei et al, 2019;Ma et al, 2019;. and Saharia et al (2020) proposed aligned cross entropy or latent alignment models and achieved the best results of all non-autoregressive models without refinement or rescoring.…”
Section: Related Workmentioning
confidence: 99%