2019 IEEE 30th International Conference on Application-Specific Systems, Architectures and Processors (ASAP) 2019
DOI: 10.1109/asap.2019.00-31
|View full text |Cite
|
Sign up to set email alerts
|

Maestro: A Memory-on-Logic Architecture for Coordinated Parallel Use of Many Systolic Arrays

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 20 publications
(23 citation statements)
references
References 11 publications
0
23
0
Order By: Relevance
“…It has been shown many small systolic arrays may increase the utilization (thus efficiency) at the cost of performance [23]. Maestro [24,34] showed as much but only for short inputs only on BERT-style models. However, even when scaled to 7 nm, Maestro does not compete with modern accelerators like A100 or TPUs.…”
Section: Related Workmentioning
confidence: 99%
“…It has been shown many small systolic arrays may increase the utilization (thus efficiency) at the cost of performance [23]. Maestro [24,34] showed as much but only for short inputs only on BERT-style models. However, even when scaled to 7 nm, Maestro does not compete with modern accelerators like A100 or TPUs.…”
Section: Related Workmentioning
confidence: 99%
“…As such, systolic arrays achieve higher power efficiency and peak throughput compared to dataflow architectures. Moreover, recent proposals to couple multiple systolic arrays in a single die (i.e., multi-pod designs [4,29,33]) allows benefiting data and task-level parallelism, further improving the gain from provisioned silicon.…”
Section: Why Scale-out Systolic Arrays?mentioning
confidence: 99%
“…While these multi-pod accelerators achieve much better utilization over their monolithic counterparts with multi-tenancy, variability in array size requirements in workloads remains a fundamental limitation to utilization in a few coarse-grain pods. In contrast, multi-pod designs with minimally sized arrays [33] target maximum utilization. Unfortunately, these designs compromise the inference accelerator's power efficiency by over-provisioning overall on-chip memory [17] (e.g., 8x8 arrays incur 5 − 10× more memory accesses than 128 × 128 arrays).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Systolic arrays are being explored extensively for improvements in matrix operations that directly relate them to deep neural network implementations. These architectures have been found useful, especially during the inference phase of network processing (Kung et al , 2019).…”
Section: Compression Of Deep Neural Networkmentioning
confidence: 99%