Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu 2018
DOI: 10.18653/v1/n18-1117
|View full text |Cite
|
Sign up to set email alerts
|

Dense Information Flow for Neural Machine Translation

Abstract: Recently, neural machine translation has achieved remarkable progress by introducing well-designed deep neural networks into its encoder-decoder framework. From the optimization perspective, residual connections are adopted to improve learning performance for both encoder and decoder in most of these deep architectures, and advanced attention connections are applied as well. Inspired by the success of the DenseNet model in computer vision problems, in this paper, we propose a densely connected NMT architecture… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 32 publications
(23 citation statements)
references
References 12 publications
0
17
0
Order By: Relevance
“…Neural Machine Translation Given the bilingual translation pair (x, y), an NMT model learns the parameter θ by maximizing the loglikelihood log P (y|x, θ). The encoder-decoder framework (Bahdanau et al, 2015;Luong et al, 2015b;Sutskever et al, 2014;Wu et al, 2016;Gehring et al, 2017;Vaswani et al, 2017;Shen et al, 2018;) is adopted to model the conditional probability P (y|x, θ), where the encoder maps the input to a set of hidden representations h and the decoder generates each target 1 Many-to-many translation can be bridged through manyto-one and one-to-many translations. Our methods can be also extended to the many-to-many setting with some modifications.…”
Section: Introductionmentioning
confidence: 99%
“…Neural Machine Translation Given the bilingual translation pair (x, y), an NMT model learns the parameter θ by maximizing the loglikelihood log P (y|x, θ). The encoder-decoder framework (Bahdanau et al, 2015;Luong et al, 2015b;Sutskever et al, 2014;Wu et al, 2016;Gehring et al, 2017;Vaswani et al, 2017;Shen et al, 2018;) is adopted to model the conditional probability P (y|x, θ), where the encoder maps the input to a set of hidden representations h and the decoder generates each target 1 Many-to-many translation can be bridged through manyto-one and one-to-many translations. Our methods can be also extended to the many-to-many setting with some modifications.…”
Section: Introductionmentioning
confidence: 99%
“…Recent studies show that different encoder layers capture linguistic properties of different levels (Peters et al, 2018), and aggregating layers is of profound value to better fuse semantic information (Shen et al, 2018;Dou et al, 2018;Dou et al, 2019). We assume that different decoder layers may value different levels of information i.e.…”
Section: Inputmentioning
confidence: 99%
“…Concerning natural language processing, Peters et al (2018) have found that combining different layers is helpful and their model significantly improves state-of-the-art models on various tasks. Researchers have also explored fusing information for NMT models and demonstrate aggregating layers is also useful for NMT (Shen et al 2018;Wang et al 2018;Dou et al 2018). However, all of these works mainly focus on static aggregation in that their aggregation strategy is independent of specific hidden states.…”
Section: Related Workmentioning
confidence: 99%
“…Fusing information across layers for deep NMT models, however, has received substantially less attention. A few recent studies reveal that simultaneously exposing all layer representations outperforms methods that utilize just the top layer for natural language processing tasks (Peters et al 2018;Shen et al 2018;Wang et al 2018;Dou et al 2018). However, their methods mainly focus on static aggregation in that the aggregation mechanisms are the same across different positions in the sequence.…”
Section: Introductionmentioning
confidence: 99%