Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1437
|View full text |Cite
|
Sign up to set email alerts
|

FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow

Abstract: Most sequence-to-sequence (seq2seq) models are autoregressive; they generate each token by conditioning on previously generated tokens. In contrast, non-autoregressive seq2seq models generate all tokens in one pass, which leads to increased efficiency through parallel processing on hardware such as GPUs. However, directly modeling the joint distribution of all tokens simultaneously is challenging, and even with increasingly complex model structures accuracy lags significantly behind autoregressive models. In t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
135
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 128 publications
(135 citation statements)
references
References 25 publications
0
135
0
Order By: Relevance
“…Non-autoregressive neural machine translation (Gu et al, 2018) aims to enable the parallel generation of output tokens without sacrificing translation quality. There has been a surge of recent interest in this family of efficient decoding models, resulting in the development of iterative refinement (Lee et al, 2018), CTC models (Libovicky and Helcl, 2018), insertion-based methods Chan et al, 2019b), editbased methods (Gu et al, 2019;Ruis et al, 2019), masked language models (Ghazvininejad et al, 2019(Ghazvininejad et al, , 2020b, and normalizing flow models (Ma et al, 2019). Some of these methods generate the output tokens in a constant number of steps (Gu et al, 2018;Libovicky and Helcl, 2018;Lee et al, 2018;Ghazvininejad et al, 2019Ghazvininejad et al, , 2020b, while others require a logarithmic number of generation steps Chan et al, 2019b,a;Li and Chan, 2019).…”
Section: Introductionmentioning
confidence: 99%
“…Non-autoregressive neural machine translation (Gu et al, 2018) aims to enable the parallel generation of output tokens without sacrificing translation quality. There has been a surge of recent interest in this family of efficient decoding models, resulting in the development of iterative refinement (Lee et al, 2018), CTC models (Libovicky and Helcl, 2018), insertion-based methods Chan et al, 2019b), editbased methods (Gu et al, 2019;Ruis et al, 2019), masked language models (Ghazvininejad et al, 2019(Ghazvininejad et al, , 2020b, and normalizing flow models (Ma et al, 2019). Some of these methods generate the output tokens in a constant number of steps (Gu et al, 2018;Libovicky and Helcl, 2018;Lee et al, 2018;Ghazvininejad et al, 2019Ghazvininejad et al, , 2020b, while others require a logarithmic number of generation steps Chan et al, 2019b,a;Li and Chan, 2019).…”
Section: Introductionmentioning
confidence: 99%
“…Decoder Architecture. Many latent variable models for text use LSTMs (Hochreiter and Schmidhuber, 1997) as their decoders (Yang et al, 2017;Ziegler and Rush, 2019;Ma et al, 2019). However, state-of-the-art models in neural machine translation have seen increased performance and speed using deep Transformer architectures.…”
Section: Modelmentioning
confidence: 99%
“…Learning On the other hand, Ma et al (2019); Shu et al (2020) proposed to use continuous latent variables for non-autoregressive translation. By letting the latent variables z (of dimensionality T × D) capture the dependencies between the target tokens, the decoder p θ (y|z, x) can be factorized over time.…”
Section: Refinement In a Hybrid Spacementioning
confidence: 99%
“…Meanwhile, another line of work proposed to use continuous latent variables for non-autoregressive translation, such that the distribution of the target sentences can be factorized over time given the latent variables (Ma et al, 2019;Shu et al, 2020). Unlike the models discussed above, finding the most likely target sentence under these models requires searching over continuous latent variables.…”
Section: Introductionmentioning
confidence: 99%