Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu 2018
DOI: 10.18653/v1/n18-2074
|View full text |Cite
|
Sign up to set email alerts
|

Self-Attention with Relative Position Representations

Abstract: Relying entirely on an attention mechanism, the Transformer introduced by Vaswani et al. (2017) achieves state-of-the-art results for machine translation. In contrast to recurrent and convolutional neural networks, it does not explicitly model relative or absolute position information in its structure. Instead, it requires adding representations of absolute positions to its inputs. In this work we present an alternative approach, extending the self-attention mechanism to efficiently consider representations of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

9
1,227
1
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 1,688 publications
(1,238 citation statements)
references
References 11 publications
9
1,227
1
1
Order By: Relevance
“…Relative Position (a) Sequential Position Encoding (b) Structural Position Encoding Figure 1: Illustration of (a) the standard sequential position encoding (Vaswani et al, 2017;Shaw et al, 2018), and (b) the proposed structural position encoding. The relative position in the example is for the word "talk".…”
Section: Absolute Positionmentioning
confidence: 99%
See 1 more Smart Citation
“…Relative Position (a) Sequential Position Encoding (b) Structural Position Encoding Figure 1: Illustration of (a) the standard sequential position encoding (Vaswani et al, 2017;Shaw et al, 2018), and (b) the proposed structural position encoding. The relative position in the example is for the word "talk".…”
Section: Absolute Positionmentioning
confidence: 99%
“…ABSPE(abs seq ) and ABSPE(abs stru ) are absolute sequential and structural position embedding in Eq.3 and Eq.5 respectively. For the relative position, we follow Shaw et al (2018) to extend the self-attention computation to consider the pairwise relationships and project the relative structural position as described at Eq.…”
Section: Integrating Structural Pe Into Sansmentioning
confidence: 99%
“…In light of this, we propose the Relative Temporal Encoding (RTE) mechanism to model the dynamic dependencies in heterogeneous graphs. RTE is inspired by Transformer's positional encoding method [15,21], which has been shown successful to capture the sequential dependencies of words in long texts.…”
Section: Relative Temporal Encodingmentioning
confidence: 99%
“…Besides, to capture the relative position representations. we incorporate self-attentive mechanism (Shaw et al, 2018) during encoding process. Typically, each output element h i is computed as weighted sum of a linearly transformed input elements…”
Section: Self-attentive Recursive Autoencodermentioning
confidence: 99%