Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers

Gu, Albert; Johnson, Isys; Goel, Karan; Saab, Khaled; Dao, Tri; Rudra, Atri; Ré, Christopher

doi:10.48550/arxiv.2110.13985

Cited by 1 publication

(1 citation statement)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With the ever-increasing scale of parameters and the elongation of input token sequences, various large-scale models built upon the transformer block inevitably encounter computational efficiency issues concerning long sequences. Against this backdrop, the Mamba architecture is emerging as an innovative design, integrating the state space model (SSM) [172,173] framework with the transformer [21] architecture, thereby reducing reliance on the attention mechanism and realizing linear-time complexity in sequence modeling.…”

Section: Less Computation More Tokensmentioning

confidence: 99%

From Large Language Models to Large Multimodal Models: A Literature Review

Huang,

Yan,

et al. 2024

Applied Sciences

View full text Add to dashboard Cite

With the deepening of research on Large Language Models (LLMs), significant progress has been made in recent years on the development of Large Multimodal Models (LMMs), which are gradually moving toward Artificial General Intelligence. This paper aims to summarize the recent progress from LLMs to LMMs in a comprehensive and unified way. First, we start with LLMs and outline various conceptual frameworks and key techniques. Then, we focus on the architectural components, training strategies, fine-tuning guidance, and prompt engineering of LMMs, and present a taxonomy of the latest vision–language LMMs. Finally, we provide a summary of both LLMs and LMMs from a unified perspective, make an analysis of the development status of large-scale models in the view of globalization, and offer potential research directions for large-scale models.

show abstract

Section: Less Computation More Tokensmentioning

confidence: 99%