2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) 2019
DOI: 10.1109/ispass.2019.00042
|View full text |Cite
|
Sign up to set email alerts
|

Timeloop: A Systematic Approach to DNN Accelerator Evaluation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
296
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 375 publications
(296 citation statements)
references
References 29 publications
0
296
0
Order By: Relevance
“…DimHW tiling is 6.5× faster to complete, because it only requires 128 memcpys of 16K elements to completely tile the input, compared to 262K memcpys of 8 elements. The effect of a different tiling strategy on the overall operation is harder to predict (but can be estimated with analytical models like Timeloop [45] or MAESTRO [26]). For element-wise operations, tiling strategy has next to no effect; for operations whose performance depends on exploiting data reuse, changing tiling shape may impact overall runtime.…”
Section: Tiling Optimizermentioning
confidence: 99%
See 1 more Smart Citation
“…DimHW tiling is 6.5× faster to complete, because it only requires 128 memcpys of 16K elements to completely tile the input, compared to 262K memcpys of 8 elements. The effect of a different tiling strategy on the overall operation is harder to predict (but can be estimated with analytical models like Timeloop [45] or MAESTRO [26]). For element-wise operations, tiling strategy has next to no effect; for operations whose performance depends on exploiting data reuse, changing tiling shape may impact overall runtime.…”
Section: Tiling Optimizermentioning
confidence: 99%
“…Some are end-to-end systems, like Ten-sorFlow [1] or TVM [9], but they either lack simulation support or require detailed pipeline models or RTL. Other tools focus on exploring dataflows and efficiently map DNN kernels to FPGAS or ASICs [45,61,65,72,76,77]. These often implement a component library or templated designs for hardware optimization, but with a heavy focus on optimizing the accelerator, they cannot evaluate networks end-to-end, leaving a lot of design opportunities unexplored.…”
Section: Related Workmentioning
confidence: 99%
“…For designing FPGA-based DNN accelerators, current practice usually relies on roofline models [10] or customized analytical tools [13,16] to estimate the achievable performance. For ASIC-based accelerators, recently published designs [21,34,35] introduce various performance prediction methods. Eyeriss [21] proposes an energy model for capturing the energy overhead of the customized memory and computation units and a delay model that simplifies the latency calculation.…”
Section: Background and Related Workmentioning
confidence: 99%
“…However, the roofline model lack fine-grained estimation and customized models are not general as desired. Timeloop [21] and Eyeriss [22] use for and parallel-for to describe the temporal and spatial mapping of DNN accelerators. Specifically, Timeloop obtains the number of memory accesses and estimates the latency by calculating the maximum isolated execution cycle across all hardware IPs based on a double-buffering assumption.…”
Section: Introductionmentioning
confidence: 99%