2019
DOI: 10.48550/arxiv.1906.02041
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Imitation Learning for Non-Autoregressive Neural Machine Translation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 20 publications
(13 citation statements)
references
References 16 publications
0
13
0
Order By: Relevance
“…We also compare to other parallel generation methods. These include a latent variable approach: FlowSeq [28]; refinement-based approaches: CMLM [11], Levenshtein transformer [15] and SMART [12]; a mixed approach: Imputer [41]; reinforcement learning: Imitate-NAT [55]; and another sequence-based approach: NART-DCRF [49] which combines a non-autoregressive model with a 1st-order CRF. Several of these methods use fully autoregressive reranking [13], which generally gives further improvements but requires a separate test-time model.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We also compare to other parallel generation methods. These include a latent variable approach: FlowSeq [28]; refinement-based approaches: CMLM [11], Levenshtein transformer [15] and SMART [12]; a mixed approach: Imputer [41]; reinforcement learning: Imitate-NAT [55]; and another sequence-based approach: NART-DCRF [49] which combines a non-autoregressive model with a 1st-order CRF. Several of these methods use fully autoregressive reranking [13], which generally gives further improvements but requires a separate test-time model.…”
Section: Methodsmentioning
confidence: 99%
“…There has been extensive interest in non-autoregressive/parallel generation approaches, aiming at producing a sequence in parallel sub-linear time w.r.t. sequence length [13,52,26,65,53,14,11,12,48,15,28,16,49,55,30,41,64,62]. Existing approaches can be broadly classified as latent variable based [13,26,65,28,41], refinement-based [25,48,14,15,11,30,12,62] or a combination of both [41].…”
Section: Related Workmentioning
confidence: 99%
“…This also alleviates the multimodality problem (Gu et al 2017) in non-autoregeressive generation. Many non-autoregeressive translation methods are proposed for better alignment, like fertiity (Gu et al 2017), SoftCopy (Wei et al 2019) or adding reordering module (Ran et al 2019). However, the source and target words in monolingual generation tasks cannot be aligned directly like translation.…”
Section: Motivationmentioning
confidence: 99%
“…We use the BERT-base model (n layers = 12, n heads = 12, d hidden = 768) and BERT-small model (n layers = 12, n heads = 12, d hidden = 384) as our backbone based on huggingface transformers (Wolf et al 2020). The weight of model is respectively initialized by unilm1.2base-uncased (Bao et al 2020), bert-base-uncased (Devlin et al 2018) and MiniLMv1-L12-H384-uncased (Wang et al 2020).…”
Section: Experimental Settingmentioning
confidence: 99%
“…The early pioneering work about parallel generation is [42], which generated the words of selected objects first, and the rest of sentences was filled with a two-pass process. Several recent works attempt to accelerate generation by introducing a NAIC framework [14,37], which produces the entire sentences simultaneously. Although accelerating the decoding process significantly, NAIC models suffer from repetitive and missing problems.…”
Section: Related Workmentioning
confidence: 99%