FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow

Ma, Xuezhe; Zhou, Chunting; Li, Xian; Neubig, Graham; Hovy, Eduard

doi:10.18653/v1/d19-1437

Cited by 128 publications

(135 citation statements)

References 25 publications

Supporting

Mentioning

135

Contrasting

Order By: Relevance

“…Non-autoregressive neural machine translation (Gu et al, 2018) aims to enable the parallel generation of output tokens without sacrificing translation quality. There has been a surge of recent interest in this family of efficient decoding models, resulting in the development of iterative refinement (Lee et al, 2018), CTC models (Libovicky and Helcl, 2018), insertion-based methods Chan et al, 2019b), editbased methods (Gu et al, 2019;Ruis et al, 2019), masked language models (Ghazvininejad et al, 2019(Ghazvininejad et al, , 2020b, and normalizing flow models (Ma et al, 2019). Some of these methods generate the output tokens in a constant number of steps (Gu et al, 2018;Libovicky and Helcl, 2018;Lee et al, 2018;Ghazvininejad et al, 2019Ghazvininejad et al, , 2020b, while others require a logarithmic number of generation steps Chan et al, 2019b,a;Li and Chan, 2019).…”

Section: Introductionmentioning

confidence: 99%

Non-Autoregressive Machine Translation with Latent Alignments

Saharia¹,

Chan²,

Saxena³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

This paper presents two strong methods, CTC and Imputer, for non-autoregressive machine translation that model latent alignments with dynamic programming. We revisit CTC for machine translation and demonstrate that a simple CTC model can achieve state-of-theart for single-step non-autoregressive machine translation, contrary to what prior work indicates. In addition, we adapt the Imputer model for non-autoregressive machine translation and demonstrate that Imputer with just 4 generation steps can match the performance of an autoregressive Transformer baseline. Our latent alignment models are simpler than many existing non-autoregressive translation baselines; for example, we do not require target length prediction or re-scoring with an autoregressive model. On the competitive WMT'14 En→De task, our CTC model achieves 25.7 BLEU with a single generation step, while Imputer achieves 27.5 BLEU with 2 generation steps, and 28.0 BLEU with 4 generation steps. This compares favourably to the autoregressive Transformer baseline at 27.8 BLEU. * Equal contribution. † Work done as part of the Google AI Residency.

show abstract

Section: Introductionmentioning

confidence: 99%

Non-Autoregressive Machine Translation with Latent Alignments

Saharia¹,

Chan²,

Saxena³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…Decoder Architecture. Many latent variable models for text use LSTMs (Hochreiter and Schmidhuber, 1997) as their decoders (Yang et al, 2017;Ziegler and Rush, 2019;Ma et al, 2019). However, state-of-the-art models in neural machine translation have seen increased performance and speed using deep Transformer architectures.…”

Section: Modelmentioning

confidence: 99%

A Bilingual Generative Transformer for Semantic Sentence Embedding

Wieting

Neubig

Berg-Kirkpatrick

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Self Cite

View full text Add to dashboard Cite

Semantic sentence embedding models encode natural language sentences into vectors, such that closeness in embedding space indicates closeness in the semantics between the sentences. Bilingual data offers a useful signal for learning such embeddings: properties shared by both sentences in a translation pair are likely semantic, while divergent properties are likely stylistic or language-specific. We propose a deep latent variable model that attempts to perform source separation on parallel sentences, isolating what they have in common in a latent semantic vector, and explaining what is left over with language-specific latent vectors. Our proposed approach differs from past work on semantic sentence encoding in two ways. First, by using a variational probabilistic framework, we introduce priors that encourage source separation, and can use our model's posterior to predict sentence embeddings for monolingual data at test time. Second, we use high-capacity transformers as both data generating distributions and inference networkscontrasting with most past work on sentence embeddings. In experiments, our approach substantially outperforms the state-of-the-art on a standard suite of unsupervised semantic similarity evaluations. Further, we demonstrate that our approach yields the largest gains on more difficult subsets of these evaluations where simple word overlap is not a good indicator of similarity. 1

show abstract

“…Learning On the other hand, Ma et al (2019); Shu et al (2020) proposed to use continuous latent variables for non-autoregressive translation. By letting the latent variables z (of dimensionality T × D) capture the dependencies between the target tokens, the decoder p θ (y|z, x) can be factorized over time.…”

Section: Refinement In a Hybrid Spacementioning

confidence: 99%

“…Meanwhile, another line of work proposed to use continuous latent variables for non-autoregressive translation, such that the distribution of the target sentences can be factorized over time given the latent variables (Ma et al, 2019;Shu et al, 2020). Unlike the models discussed above, finding the most likely target sentence under these models requires searching over continuous latent variables.…”

Section: Introductionmentioning

confidence: 99%

Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation

Lee¹,

Shu²,

Cho³

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

We propose an efficient inference procedure for non-autoregressive machine translation that iteratively refines translation purely in the continuous space. Given a continuous latent variable model for machine translation (Shu et al., 2020), we train an inference network to approximate the gradient of the marginal log probability of the target sentence, using only the latent variable as input. This allows us to use gradient-based optimization to find the target sentence at inference time that approximately maximizes its marginal probability. As each refinement step only involves computation in the latent space of low dimensionality (we use 8 in our experiments), we avoid computational overhead incurred by existing non-autoregressive inference procedures that often refine in token space. We compare our approach to a recently proposed EM-like inference procedure (Shu et al., 2020) that optimizes in a hybrid space, consisting of both discrete and continuous variables. We evaluate our approach on WMT'14 En→De, WMT'16 Ro→En and IWSLT'16 De→En, and observe two advantages over the EM-like inference:(1) it is computationally efficient, i.e. each refinement step is twice as fast, and (2) it is more effective, resulting in higher marginal probabilities and BLEU scores with the same number of refinement steps. On WMT'14 En→De, for instance, our approach is able to decode 6.2 times faster than the autoregressive model with minimal degradation to translation quality (0.9 BLEU).

show abstract

FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow

Cited by 128 publications

References 25 publications

Non-Autoregressive Machine Translation with Latent Alignments

Non-Autoregressive Machine Translation with Latent Alignments

A Bilingual Generative Transformer for Semantic Sentence Embedding

Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation

Contact Info

Product

Resources

About