2021
DOI: 10.1609/aaai.v35i16.17692
|View full text |Cite
|
Sign up to set email alerts
|

Continuous Self-Attention Models with Neural ODE Networks

Abstract: Stacked self-attention models receive widespread attention, due to its ability of capturing global dependency among words. However, the stacking of many layers and components generates huge parameters, leading to low parameter efficiency. In response to this issue, we propose a lightweight architecture named Continuous Self-Attention models with neural ODE networks (CSAODE). In CSAODE, continuous dynamical models (i.e., neural ODEs) are coupled with our proposed self-attention block to form a self-attention OD… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 22 publications
0
3
0
Order By: Relevance
“…For example, due to the nature of adaptive step size ODE solvers, the situation that many consecutive layers are dynamically equivalent is very common, in [41], the problem is solved by applying optimal transportation theory to encourage simpler trajectory dynamics. Recent developments extend these ideas to continuous-time video forecasting [13] and continuous attention architectures [42], [43]. However, the application to anytime human 3D pose forecasting has not been explored.…”
Section: Related Workmentioning
confidence: 99%
“…For example, due to the nature of adaptive step size ODE solvers, the situation that many consecutive layers are dynamically equivalent is very common, in [41], the problem is solved by applying optimal transportation theory to encourage simpler trajectory dynamics. Recent developments extend these ideas to continuous-time video forecasting [13] and continuous attention architectures [42], [43]. However, the application to anytime human 3D pose forecasting has not been explored.…”
Section: Related Workmentioning
confidence: 99%
“…Neural differential equation (Chen et al 2018;Zhang et al 2021) was proposed to build a continuous-time state machine for learning representation of sequence data {x tn } where the time points {t n } N n=1 are irregularly sampled. NDE was implemented to learn the dynamics of transformation so as to characterize the state transition z(t) at continuous-time t between input samples and output targets based on an ordinary differential equation (ODE).…”
Section: Continuous-time State Machinementioning
confidence: 99%
“…An attention-based neural network solution to fake news detection is put forth by Zhang et al [16]. When producing predictions, their method focuses on the most crucial information in news items using an attention-based neural network.…”
Section: Introductionmentioning
confidence: 99%