2021
DOI: 10.48550/arxiv.2110.13985
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers

Abstract: Recurrent neural networks (RNNs), temporal convolutions, and neural differential equations (NDEs) are popular families of deep learning models for time-series data, each with unique strengths and tradeoffs in modeling power and computational efficiency. We introduce a simple sequence model inspired by control systems that generalizes these approaches while addressing their shortcomings. The Linear State-Space Layer (LSSL) maps a sequence u → y by simply simulating a linear continuous-time state-space represent… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 22 publications
0
1
0
Order By: Relevance
“…With the ever-increasing scale of parameters and the elongation of input token sequences, various large-scale models built upon the transformer block inevitably encounter computational efficiency issues concerning long sequences. Against this backdrop, the Mamba architecture is emerging as an innovative design, integrating the state space model (SSM) [172,173] framework with the transformer [21] architecture, thereby reducing reliance on the attention mechanism and realizing linear-time complexity in sequence modeling.…”
Section: Less Computation More Tokensmentioning
confidence: 99%
“…With the ever-increasing scale of parameters and the elongation of input token sequences, various large-scale models built upon the transformer block inevitably encounter computational efficiency issues concerning long sequences. Against this backdrop, the Mamba architecture is emerging as an innovative design, integrating the state space model (SSM) [172,173] framework with the transformer [21] architecture, thereby reducing reliance on the attention mechanism and realizing linear-time complexity in sequence modeling.…”
Section: Less Computation More Tokensmentioning
confidence: 99%