2021 IEEE Hot Chips 33 Symposium (HCS) 2021
DOI: 10.1109/hcs52781.2021.9567250
|View full text |Cite
|
Sign up to set email alerts
|

SambaNova SN10 RDU:Accelerating Software 2.0 with Dataflow

Abstract: Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate increase in computeto-memory ratio of modern AI accelerators have created a memory wall, necessitating new methods to deploy AI. Recent research has shown that a composition of many smaller expert models, each with several orders of magnitude fewer parameters, can m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(5 citation statements)
references
References 19 publications
0
5
0
Order By: Relevance
“…• After last year releasing some impressive benchmark results for their reconfigurable AI accelerator technology [119] and this year publishing two deeper technology reveals [120], [121] and an applications paper with Argonne National Laboratory [122], SambaNova still has not provided any details from which we can estimate peak performance or power consumption of their solutions. • In May 2022, Intel's Habana Labs announced the second generations of the Goya inference accelerator and Gaudi training accelerator, named Greco and Gaudi2, respectively [123], [124].…”
Section: Survey Of Processorsmentioning
confidence: 99%
“…• After last year releasing some impressive benchmark results for their reconfigurable AI accelerator technology [119] and this year publishing two deeper technology reveals [120], [121] and an applications paper with Argonne National Laboratory [122], SambaNova still has not provided any details from which we can estimate peak performance or power consumption of their solutions. • In May 2022, Intel's Habana Labs announced the second generations of the Goya inference accelerator and Gaudi training accelerator, named Greco and Gaudi2, respectively [123], [124].…”
Section: Survey Of Processorsmentioning
confidence: 99%
“…The time to load evk (r) rot s and plaintexts, which are used once per H-(I)DFT, becomes the hard bound of latency for H-(I)DFT. Even if we assume that integrating hundreds of MBs of on-chip memory is feasible with the current fabrication technology [43], [51], [66], we must fetch the single-use data of H-(I)DFT from off-chip memory due to its large aggregate size and limited data reuse opportunities. With the latest HBM3 [41], a system having a 3TB/s off-chip memory bandwidth is feasible [28], which would allow loading the single-use data in 2.1ms (resp., 0.2ms) for H-IDFT (H-DFT).…”
Section: Memory Bottleneck In Bootstrappingmentioning
confidence: 99%
“…Recently, on-chip SRAM capacities have scaled significantly [6] such that the level of hundreds of MBs of on-chip SRAM is feasible, providing tens of TB/s of SRAM bandwidth [47,55,69]. While the bandwidth of the main-memory has also increased, its aggregate throughput is still more than an order of magnitude lower than the on-chip SRAM bandwidth [66], achieving a few TB/s of throughput even with high-bandwidth memory (HBM).…”
Section: Technology-driven Parameter Selection Of Bootstrappable Acce...mentioning
confidence: 99%