2020
DOI: 10.1609/aaai.v34i05.6413
|View full text |Cite
|
Sign up to set email alerts
|

Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference Using a Delta Posterior

Abstract: Although neural machine translation models reached high translation quality, the autoregressive nature makes inference difficult to parallelize and leads to high translation latency. Inspired by recent refinement-based approaches, we propose LaNMT, a latent-variable non-autoregressive model with continuous latent variables and deterministic inference procedure. In contrast to existing approaches, we use a deterministic inference algorithm to find the target sequence that maximizes the lowerbound to the log-pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
106
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
2
1

Relationship

2
7

Authors

Journals

citations
Cited by 84 publications
(106 citation statements)
references
References 15 publications
0
106
0
Order By: Relevance
“…We first conduct experiments to compare the performance of FlowSeq with strong baseline models, including NAT w/ Fertility (Gu et al, 2018), NAT-IR , NAT-REG (Wang et al, 2019), LV NAR (Shu et al, 2019), CTC Loss (Libovickỳ and Helcl, 2018), and CMLM (Ghazvininejad et al, 2019).…”
Section: Resultsmentioning
confidence: 99%
“…We first conduct experiments to compare the performance of FlowSeq with strong baseline models, including NAT w/ Fertility (Gu et al, 2018), NAT-IR , NAT-REG (Wang et al, 2019), LV NAR (Shu et al, 2019), CTC Loss (Libovickỳ and Helcl, 2018), and CMLM (Ghazvininejad et al, 2019).…”
Section: Resultsmentioning
confidence: 99%
“…Our baseline models include an LSTM sequenceto-sequence with attention, Transformer (Vaswani et al, 2017), and a non-autoregressive model LaNMT (Shu et al, 2020). For a fair comparison, we trained all models with negative loglikelihood loss or knowledge distillation (Kim and Rush, 2016) if applicable.…”
Section: Multi30k Translationmentioning
confidence: 99%
“…Latent variable models such as variational autoencoders and adversarial autoencoders assume the existence of unobserved (latent) variables Z = {z 1 , z 2 , ..., z k } that aim to capture dependencies among the vertices V and edges E of a graph G. Unlike an autoregressive model, a latent variable model does not necessarily require a predefined ordering of the graph [14]. The generation process consists of first sampling latent variables according to their prior distributions, followed by sampling vertices and edges conditioned on these latent variable samples.…”
Section: Introductionmentioning
confidence: 99%