Proceedings of the 26th Asia and South Pacific Design Automation Conference 2021
DOI: 10.1145/3394885.3431539
|View full text |Cite
|
Sign up to set email alerts
|

DeepOpt

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 13 publications
(3 citation statements)
references
References 6 publications
0
3
0
Order By: Relevance
“…With the development of heterogeneous computing hardware, GPU, ASIC, and FPGA have demonstrated significant speed-up performance in the field of deep learning [15]- [20]. How to apply them to FL to address the above bottlenecks has become a hot research topic in academia and industry [21]- [26].…”
Section: Introductionmentioning
confidence: 99%
“…With the development of heterogeneous computing hardware, GPU, ASIC, and FPGA have demonstrated significant speed-up performance in the field of deep learning [15]- [20]. How to apply them to FL to address the above bottlenecks has become a hot research topic in academia and industry [21]- [26].…”
Section: Introductionmentioning
confidence: 99%
“…For example, SCALE-Sim [2] develops an inference simulator for ASIC-based systolic accelerators that models convolution layers only. Timeloop [3], MAESTRO [4], and DeepOpt [5] propose DNN dataflow analysis frameworks for inference accelerators, but their evaluations focus only on convolution layers, and the work in [6] proposes an energy estimation model for the convolution operation only. While the modeling efforts in [7], [8] include pooling and tensor addition along with convolution, their scope is limited to inference and do not have support for training operations.…”
Section: Introductionmentioning
confidence: 99%
“…This can lead to inaccurate estimation of runtime, especially during training where the operations are extremely memoryintensive. The works in [5], [7] do not support tiling across the kernel height-width dimensions. While this is acceptable for the inference phase, where kernel height×width is quite small (i.e., lies between 1×1 to 11×11 for standard CNNs), Therefore, tiling across these kernel dimensions is necessary since they can be too large to fit in the on-chip memory.…”
Section: Introductionmentioning
confidence: 99%