Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing 2017
DOI: 10.18653/v1/d17-1150
|View full text |Cite
|
Sign up to set email alerts
|

Towards Bidirectional Hierarchical Representations for Attention-based Neural Machine Translation

Abstract: This paper proposes a hierarchical attentional neural translation model which focuses on enhancing source-side hierarchical representations by covering both local and global semantic information using a bidirectional tree-based encoder. To maximize the predictive likelihood of target words, a weighted variant of an attention mechanism is used to balance the attentive information between lexical and phrase vectors. Using a tree-based rare word encoding, the proposed model is extended to sub-word level to allevi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
14
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 24 publications
(15 citation statements)
references
References 17 publications
1
14
0
Order By: Relevance
“…As seen in Figure 4 (b) and (c), the performance of pre-trained NMT encoders obviously became worse on long-distance cases across language pairs and model variants. This is consistent with prior observation on NMT systems that both RNN and SAN fail to fully capture long-distance dependencies (Tai et al, 2015;Yang et al, 2017;Tang et al, 2018).…”
Section: Discussionsupporting
confidence: 92%
“…As seen in Figure 4 (b) and (c), the performance of pre-trained NMT encoders obviously became worse on long-distance cases across language pairs and model variants. This is consistent with prior observation on NMT systems that both RNN and SAN fail to fully capture long-distance dependencies (Tai et al, 2015;Yang et al, 2017;Tang et al, 2018).…”
Section: Discussionsupporting
confidence: 92%
“…Therefore, we merely apply locality modeling to the lower layers, which same to the configuration in Yu et al (2018) and . In this way, the representations are learned in a hierarchical fashion (Yang et al, 2017). That is, the distance-aware and local information extracted by the lower SAN layers, is expected to complement distance-agnostic and global information captured by the higher SAN layers.…”
Section: Locality Modeling Via 1d Convolutionmentioning
confidence: 99%
“…We refer to models containing this topdown pass as top-down tree2seq models. Note that such a top-down pass has been shown to aid in tree-based NMT with supervised syntactic information (Chen et al, 2017a;Yang et al, 2017); here, we add it to our unsupervised hierarchies.…”
Section: Top-down Encoder Passmentioning
confidence: 99%
“…Eriguchi et al (2016) introduced a tree-LSTM encoder for NMT that relied on an external parser to parse the training and test data. The tree-LSTM encoder was improved upon by Chen et al (2017a) and Yang et al (2017), who added a top-down pass. Other approaches have used convolutional networks to model source syntax.…”
Section: Related Workmentioning
confidence: 99%