2019
DOI: 10.1609/aaai.v33i01.33013723
|View full text |Cite
|
Sign up to set email alerts
|

Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input

Abstract: Non-autoregressive translation (NAT) models, which remove the dependence on previous target tokens from the inputs of the decoder, achieve significantly inference speedup but at the cost of inferior accuracy compared to autoregressive translation (AT) models. Previous work shows that the quality of the inputs of the decoder is important and largely impacts the model accuracy. In this paper, we propose two methods to enhance the decoder inputs so as to improve NAT models. The first one directly leverages a phra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
99
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 104 publications
(99 citation statements)
references
References 10 publications
0
99
0
Order By: Relevance
“…[20,21] combined joint n-gram models with Bi-LSTM models and achieved good performance in G2P conversion. [5] adopted convolutional sequence to sequence model and proposed the non-sequential decoding [22] for G2P conversion, which achieved the previous state-of-theart result on the public CMUDict 0.7b dataset.…”
Section: Grapheme-to-phoneme Conversionmentioning
confidence: 74%
“…[20,21] combined joint n-gram models with Bi-LSTM models and achieved good performance in G2P conversion. [5] adopted convolutional sequence to sequence model and proposed the non-sequential decoding [22] for G2P conversion, which achieved the previous state-of-theart result on the public CMUDict 0.7b dataset.…”
Section: Grapheme-to-phoneme Conversionmentioning
confidence: 74%
“…There are many design choices in the encoderdecoder framework based on different types of layers, such as RNN-based (Sutskever et al, 2014), CNN-based (Gehring et al, 2017), and selfattention based (Vaswani et al, 2017) In term of speeding up the decoding of the neural Transformer, Gu et al (2017) modified the autoregressive architecture to directly generate target words in parallel. In past two years, non-autoregressive and semi-autoregressive models have been extensively studied (Oord et al, 2017;Kaiser et al, 2018;Lee et al, 2018;Libovický and Helcl, 2018;Wang et al, 2019;Guo et al, 2018;Zhou et al, 2019a). Previous work shows that NAT can be improved via knowledge distillation from AT models.…”
Section: Related Workmentioning
confidence: 99%
“…Due to the multimodality problem [13], the performance of NAR model is usually inferior to AR model. Recently, a line of works aiming to bridge the performance gap between NAR and AR model for translation task has been presented [11,14].…”
Section: Non-autoregressive Decodingmentioning
confidence: 99%