2021
DOI: 10.48550/arxiv.2104.08698
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Simple and Effective Positional Encoding for Transformers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 9 publications
0
3
0
Order By: Relevance
“…In the original version of a transformer, this projection is replaced by pre-trained token embeddings 37,41 . Afterwards, an encoding layer 90 added positional information that would have been lost in the attention module 37 . After the positional encoding, a multi-head attention module calculated attention weights encoding temporal dynamics.…”
Section: Model Architecturesmentioning
confidence: 99%
“…In the original version of a transformer, this projection is replaced by pre-trained token embeddings 37,41 . Afterwards, an encoding layer 90 added positional information that would have been lost in the attention module 37 . After the positional encoding, a multi-head attention module calculated attention weights encoding temporal dynamics.…”
Section: Model Architecturesmentioning
confidence: 99%
“…[53] Because each word matches sine and cosine curves of different periods using the transformation equation of the trigonometric function, different positions obtain unique positional encoding. In addition, the latest research reports on advanced positional encoding, such as Decoupled posItional attEntion for Transformers (DIET) [54] and Position Encoding Generator (PEG). [55]…”
Section: Positional Encodingmentioning
confidence: 99%
“…It might be difficult to effectively capture the global context or long-range dependencies, which are essential for comprehending complex scenes or capturing relationships between far-off objects. To address some of these issues, involution neural networks (INNs), which are computationally efficient and more parallelizable, are used as alternatives to CNNs (Chen et al, 2021). Unlike standard convolution kernels, which are spatially agnostic and channel specific, involution kernels are more suitable for capturing long-range spatial information while minimizing network parameters.…”
Section: Introductionmentioning
confidence: 99%