2021
DOI: 10.48550/arxiv.2107.07999
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked Transformers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 0 publications
0
2
0
Order By: Relevance
“…In the graph domain, linear transformers are not well studied. Choromanski et al [10] are the first to adapt Performer-style attention kernelization to small graphs.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In the graph domain, linear transformers are not well studied. Choromanski et al [10] are the first to adapt Performer-style attention kernelization to small graphs.…”
Section: Related Workmentioning
confidence: 99%
“…To the best of our knowledge, application of efficient attention models has not yet been thoroughly studied in the graph domain, e.g., only one work [10] explores the adaptation of Performer-style [11] attention approximation on small graphs. Particular challenges emerge with explicit edge features that are incorporated as attention bias in fully-connected graph transformers [34,59].…”
Section: Introductionmentioning
confidence: 99%