Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference Using a Delta Posterior

Shu, Raphael; Lee, Jason; Nakayama, Hideki; Cho, Kyunghyun

doi:10.1609/aaai.v34i05.6413

Cited by 84 publications

(106 citation statements)

References 15 publications

Supporting

Mentioning

106

Contrasting

Order By: Relevance

“…We first conduct experiments to compare the performance of FlowSeq with strong baseline models, including NAT w/ Fertility (Gu et al, 2018), NAT-IR , NAT-REG (Wang et al, 2019), LV NAR (Shu et al, 2019), CTC Loss (Libovickỳ and Helcl, 2018), and CMLM (Ghazvininejad et al, 2019).…”

Section: Resultsmentioning

confidence: 99%

FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow

Ma¹,

Zhou²,

Li³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

128

126

View full text Add to dashboard Cite

Most sequence-to-sequence (seq2seq) models are autoregressive; they generate each token by conditioning on previously generated tokens. In contrast, non-autoregressive seq2seq models generate all tokens in one pass, which leads to increased efficiency through parallel processing on hardware such as GPUs. However, directly modeling the joint distribution of all tokens simultaneously is challenging, and even with increasingly complex model structures accuracy lags significantly behind autoregressive models. In this paper, we propose a simple, efficient, and effective model for non-autoregressive sequence generation using latent variable models. Specifically, we turn to generative flow, an elegant technique to model complex distributions using neural networks, and design several layers of flow tailored for modeling the conditional density of sequential latent variables. We evaluate this model on three neural machine translation (NMT) benchmark datasets, achieving comparable performance with state-of-the-art nonautoregressive NMT models and almost constant decoding time w.r.t the sequence length. 1

show abstract

Section: Resultsmentioning

confidence: 99%

FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow

Ma¹,

Zhou²,

Li³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

128

126

View full text Add to dashboard Cite

show abstract

“…Our baseline models include an LSTM sequenceto-sequence with attention, Transformer (Vaswani et al, 2017), and a non-autoregressive model LaNMT (Shu et al, 2020). For a fair comparison, we trained all models with negative loglikelihood loss or knowledge distillation (Kim and Rush, 2016) if applicable.…”

Section: Multi30k Translationmentioning

confidence: 99%

Recursive Top-Down Production for Sentence Generation with Latent Trees

Tan

Shen

Sordoni

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

We model the recursive production property of context-free grammars for natural and synthetic languages. To this end, we present a dynamic programming algorithm that marginalises over latent binary tree structures with N leaves, allowing us to compute the likelihood of a sequence of N tokens under a latent tree model, which we maximise to train a recursive neural function. We demonstrate performance on two synthetic tasks: SCAN (Lake and Baroni, 2017), where it outperforms previous models on the LENGTH split, and English question formation (McCoy et al., 2020), where it performs comparably to decoders with the ground-truth tree structure. We also present experimental results on German-English translation on the Multi30k dataset (Elliott et al., 2016), and qualitatively analyse the induced tree structures our model learns for the SCAN tasks and the German-English translation task.

show abstract

“…Latent variable models such as variational autoencoders and adversarial autoencoders assume the existence of unobserved (latent) variables Z = {z 1 , z 2 , ..., z k } that aim to capture dependencies among the vertices V and edges E of a graph G. Unlike an autoregressive model, a latent variable model does not necessarily require a predefined ordering of the graph [14]. The generation process consists of first sampling latent variables according to their prior distributions, followed by sampling vertices and edges conditioned on these latent variable samples.…”

Section: Introductionmentioning

confidence: 99%

Masked Graph Modeling for Molecule Generation

Mahmood

Mansimov²,

Bonneau³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

De novo, in-silico design of molecules is a challenging problem with applications in drug discovery and material design. We introduce a masked graph model, which learns a distribution over graphs by capturing conditional distributions over unobserved nodes (atoms) and edges (bonds) given observed ones. We train and then sample from our model by iteratively masking and replacing different parts of initialized graphs.<br>We evaluate our approach on the QM9 and ChEMBL datasets using the GuacaMol distribution-learning benchmark. We find that validity, KL-divergence and Fréchet ChemNet Distance scores are anti-correlated with novelty, and that we can trade off between these metrics more effectively than existing models. On distributional metrics, our model outperforms previously proposed graph-based approaches and is competitive with SMILES-based approaches. Finally, we show our model generates molecules with desired values of specified properties while maintaining physiochemical similarity to the<br>training distribution.

show abstract

Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference Using a Delta Posterior

Cited by 84 publications

References 15 publications

FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow

FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow

Recursive Top-Down Production for Sentence Generation with Latent Trees

Masked Graph Modeling for Molecule Generation

Contact Info

Product

Resources

About