A Simple and Effective Positional Encoding for Transformers

Chen, Pu-Chin; Tsai, Henry; Bhojanapalli, Srinadh; Chung, Hyung Won; Chang, Yuan-Hao; Ferng, Chun-Sung

doi:10.18653/v1/2021.emnlp-main.236

Cited by 25 publications

(8 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Because each word matches sine and cosine curves of different periods using the transformation equation of the trigonometric function, different positions obtain unique positional encoding. In addition, the latest research reports on advanced positional encoding, such as Decoupled posItional attEntion for Transformers (DIET) [54] and Position Encoding Generator (PEG) [55]…”

Section: Basic Architecture Of Transformersmentioning

confidence: 99%

Understanding the brain with attention: A survey of transformers in brain sciences

Chen,

Wang,

Chen

et al. 2023

Brain-X

View full text Add to dashboard Cite

Owing to their superior capabilities and advanced achievements, Transformers have gradually attracted attention with regard to understanding complex brain processing mechanisms. This study aims to comprehensively review and discuss the applications of Transformers in brain sciences. First, we present a brief introduction of the critical architecture of Transformers. Then, we overview and analyze their most relevant applications in brain sciences, including brain disease diagnosis, brain age prediction, brain anomaly detection, semantic segmentation, multi‐modal registration, functional Magnetic Resonance Imaging (fMRI) modeling, Electroencephalogram (EEG) processing, and multi‐task collaboration. We organize the model details and open sources for reference and replication. In addition, we discuss the quantitative assessments, model complexity, and optimization of Transformers, which are topics of great concern in the field. Finally, we explore possible future challenges and opportunities, exploiting some concrete and recent cases to provoke discussion and innovation. We hope that this review will stimulate interest in further research on Transformers in the context of brain sciences.

show abstract

Section: Basic Architecture Of Transformersmentioning

confidence: 99%

Understanding the brain with attention: A survey of transformers in brain sciences

Chen,

Wang,

Chen

et al. 2023

Brain-X

View full text Add to dashboard Cite

show abstract

“…To address this, the first Transformer models added an "absolute" positional encoding to the model input which communicates the order of the input sequence [7]. Since then, alternative "relative" positional encoding methods have been proposed for sequential inputs which inject a bias term at different locations within the Transformer architecture [8], [9]. The aim of these strategies is the same: communicate the structure of the input data to the Transformer.…”

Section: Capturing Graph Structurementioning

confidence: 99%

“…However, Pu-Chin et al found that absolute positional encodings underperform relative positional encodings and suffer from limitations in the rank of the resulting attention matrices [9]. As a result, we focus our work exclusively on relative positional encoding strategies.…”

Section: Capturing Graph Structurementioning

confidence: 99%

AI Assisted Electrical Distribution System Design

Mahar

2023

Preprint

View full text Add to dashboard Cite

We find that traditional GNNs are not well suited to learning on utility-scale power distribution graphs due to typical attributes of power distribution systems such as: their large size and low density, their heterophilic nature, and the long paths between nodes along which information must travel. Herein we outline a novel inductive GNN architecture which has been optimized for learning on power distribution graphs and which can pass information efficiently across the graph. We also demonstrate the performance of this algorithm by applying it to ComEd's data as part of a practical use-case, and we benchmark our algorithm's performance on this data versus other well-known GNN architectures. Submitted to IEEE PES General Meeting 2024

show abstract

“…Therefore, we investigate whether the performance improvement of our method truly stems from the utterance dependencies or just because it segments the conversation. We compare ReDE with the segment encoding method introduced by Chen et al [17]. To make a fair comparison, the numbers of parameters in the compared methods are kept the same.…”

Section: Ablation Studymentioning

confidence: 99%

“…For example, the question in Case #1 is "What is the other way ...", baseline finds a wrong answer "ls FILEPATH or ls FILEPATH" in the second utterance due to the phrase "Many ways" can strongly match the "other way" in the question, and the "...or..." pattern to some extent has the meaning of "the other way". In Case #2 the question is "How does ... use the different icon", baseline finds the wrong answer with the close pattern "get ... to use a different icon", however it fails to consider the true meaning of the selected RoBERTa seg denotes the segment encoding method [17].…”

Section: Case Studymentioning

confidence: 99%

DialogXL: All-in-One XLNet for Multi-Party Conversation Emotion Recognition

Shen¹,

Chen

Xie

2021

AAAI

116

View full text Add to dashboard Cite

This paper presents our pioneering effort for emotion recognition in conversation (ERC) with pre-trained language models. Unlike regular documents, conversational utterances appear alternately from different parties and are usually organized as hierarchical structures in previous work. Such structures are not conducive to the application of pre-trained language models such as XLNet. To address this issue, we propose an all-in-one XLNet model, namely DialogXL, with enhanced memory to store longer historical context and dialog-aware self-attention to deal with the multi-party structures. Specifically, we first modify the recurrence mechanism of XLNet from segment-level to utterance-level in order to better model the conversational data. Second, we introduce dialog-aware self-attention in replacement of the vanilla self-attention in XLNet to capture useful intra- and inter-speaker dependencies. Extensive experiments are conducted on four ERC benchmarks with mainstream models presented for comparison. The experimental results show that the proposed model outperforms the baselines on all the datasets. Several other experiments such as ablation study and error analysis are also conducted and the results confirm the role of the critical modules of DialogXL.

show abstract

A Simple and Effective Positional Encoding for Transformers

Cited by 25 publications

References 12 publications

Understanding the brain with attention: A survey of transformers in brain sciences

Understanding the brain with attention: A survey of transformers in brain sciences

AI Assisted Electrical Distribution System Design

DialogXL: All-in-One XLNet for Multi-Party Conversation Emotion Recognition

Contact Info

Product

Resources

About