Self-Attention with Relative Position Representations

Shaw, Peter; Uszkoreit, Jakob; Vaswani, Ashish

doi:10.18653/v1/n18-2074

Cited by 1,688 publications

(1,238 citation statements)

References 11 publications

Supporting

Mentioning

1,227

Contrasting

Unclassified

Order By: Relevance

“…Relative Position (a) Sequential Position Encoding (b) Structural Position Encoding Figure 1: Illustration of (a) the standard sequential position encoding (Vaswani et al, 2017;Shaw et al, 2018), and (b) the proposed structural position encoding. The relative position in the example is for the word "talk".…”

Section: Absolute Positionmentioning

confidence: 99%

“…ABSPE(abs seq ) and ABSPE(abs stru ) are absolute sequential and structural position embedding in Eq.3 and Eq.5 respectively. For the relative position, we follow Shaw et al (2018) to extend the self-attention computation to consider the pairwise relationships and project the relative structural position as described at Eq.…”

Section: Integrating Structural Pe Into Sansmentioning

confidence: 99%

See 1 more Smart Citation

Self-Attention with Structural Position Representations

Wang¹,

Tu²,

Wang³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Although self-attention networks (SANs) have advanced the state-of-the-art on various NLP tasks, one criticism of SANs is their ability of encoding positions of input words (Shaw et al., 2018). In this work, we propose to augment SANs with structural position representations to model the latent structure of the input sentence, which is complementary to the standard sequential positional representations. Specifically, we use dependency tree to represent the grammatical structure of a sentence, and propose two strategies to encode the positional relationships among words in the dependency tree. Experimental results on NIST Chinese⇒English and WMT14 English⇒German translation tasks show that the proposed approach consistently boosts performance over both the absolute and relative sequential position representations.

show abstract

Section: Absolute Positionmentioning

confidence: 99%

Section: Integrating Structural Pe Into Sansmentioning

confidence: 99%

Self-Attention with Structural Position Representations

Wang¹,

Tu²,

Wang³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

show abstract

“…In light of this, we propose the Relative Temporal Encoding (RTE) mechanism to model the dynamic dependencies in heterogeneous graphs. RTE is inspired by Transformer's positional encoding method [15,21], which has been shown successful to capture the sequential dependencies of words in long texts.…”

Section: Relative Temporal Encodingmentioning

confidence: 99%

Heterogeneous Graph Transformer

Dong

Wang

et al. 2020

Proceedings of the Web Conference 2020

882

483

View full text Add to dashboard Cite

Recent years have witnessed the emerging success of graph neural networks (GNNs) for modeling structured data. However, most GNNs are designed for homogeneous graphs, in which all nodes and edges belong to the same types, making them infeasible to represent heterogeneous structures. In this paper, we present the Heterogeneous Graph Transformer (HGT) architecture for modeling Web-scale heterogeneous graphs. To model heterogeneity, we design node-and edge-type dependent parameters to characterize the heterogeneous attention over each edge, empowering HGT to maintain dedicated representations for different types of nodes and edges. To handle dynamic heterogeneous graphs, we introduce the relative temporal encoding technique into HGT, which is able to capture the dynamic structural dependency with arbitrary durations. To handle Web-scale graph data, we design the heterogeneous mini-batch graph sampling algorithm-HGSampling-for efficient and scalable training. Extensive experiments on the Open Academic Graph of 179 million nodes and 2 billion edges show that the proposed HGT model consistently outperforms all the state-of-the-art GNN baselines by 9%-21% on various downstream tasks. The dataset and source code of HGT are publicly available at https://github.com/acbull/pyHGT.

show abstract

“…Besides, to capture the relative position representations. we incorporate self-attentive mechanism (Shaw et al, 2018) during encoding process. Typically, each output element h i is computed as weighted sum of a linearly transformed input elements…”

Section: Self-attentive Recursive Autoencodermentioning

confidence: 99%

Towards Controllable and Personalized Review Generation

Li¹,

Tuzhilin²

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

In this paper, we propose a novel model RevGAN that automatically generates controllable and personalized user reviews based on the arbitrarily given sentimental and stylistic information. RevGAN utilizes the combination of three novel components, including selfattentive recursive autoencoders, conditional discriminators, and personalized decoders. We test its performance on the several real-world datasets, where our model significantly outperforms state-of-the-art generation models in terms of sentence quality, coherence, personalization and human evaluations. We also empirically show that the generated reviews could not be easily distinguished from the organically produced reviews and that they follow the same statistical linguistics laws.

show abstract

Self-Attention with Relative Position Representations

Cited by 1,688 publications

References 11 publications

Self-Attention with Structural Position Representations

Self-Attention with Structural Position Representations

Heterogeneous Graph Transformer

Towards Controllable and Personalized Review Generation

Contact Info

Product

Resources

About