AutoDNNchip

Xu, Pengfei; Zhang, Xiaofan; Hao, Cong; Zhao, Yang; Zhang, Yongan; Wang, Yue; Li, Chaojian; Guan, Zetong; Chen, Deming; Lin, Yingyan

doi:10.1145/3373087.3375306

Cited by 61 publications

(3 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The definitions specify simulation functionality, execution costs like area, latency, or energy [46], or hardware synthesis of each component [43]. Frequently used high-level components can also be defined, such as vector units, systolic array, and computation/memory tiles [47], as specified in libraries like MAGNet [1] and AutoDNNChip [48]. Library also integrates reliability and security costs for components and specialized components such as razor flip flops for detecting timing violations or trusted memory.…”

Section: B End-to-end Agile Design Workflowmentioning

confidence: 99%

Special Session: Towards an Agile Design Methodology for Efficient, Reliable, and Secure ML Systems

Dave

Marchisio

Hanif

et al. 2022

2022 IEEE 40th VLSI Test Symposium (VTS)

View full text Add to dashboard Cite

The real-world use cases of Machine Learning (ML) have exploded over the past few years. However, the current computing infrastructure is insufficient to support all realworld applications and scenarios. Apart from high efficiency requirements, modern ML systems are expected to be highly reliable against hardware failures as well as secure against adversarial and IP stealing attacks. Privacy concerns are also becoming a first-order issue. This article summarizes the main challenges in agile development of efficient, reliable and secure ML systems, and then presents an outline of an agile design methodology to generate efficient, reliable and secure ML systems based on user-defined constraints and objectives.

show abstract

Section: B End-to-end Agile Design Workflowmentioning

confidence: 99%

Special Session: Towards an Agile Design Methodology for Efficient, Reliable, and Secure ML Systems

Dave

Marchisio

Hanif

et al. 2022

2022 IEEE 40th VLSI Test Symposium (VTS)

View full text Add to dashboard Cite

show abstract

“…Timeloop [3], MAESTRO [4], and DeepOpt [5] propose DNN dataflow analysis frameworks for inference accelerators, but their evaluations focus only on convolution layers, and the work in [6] proposes an energy estimation model for the convolution operation only. While the modeling efforts in [7], [8] include pooling and tensor addition along with convolution, their scope is limited to inference and do not have support for training operations. TRIM [9] proposes a design space explorer for DNN training, but does not support batch normalization, a heavy training workload, and is not evaluated on mainstream ASIC accelerators [1], [10]- [13] that use systolic or vector dot-product style hardware.…”

Section: Introductionmentioning

confidence: 99%

Physically Accurate Learning-based Performance Prediction of Hardware-accelerated ML Algorithms

Esmaeilzadeh

Ghodrati

Kahng

et al. 2022

Proceedings of the 2022 ACM/IEEE Workshop on Machine Learning for CAD

View full text Add to dashboard Cite

Today's performance analysis frameworks for deep learning accelerators suffer from two significant limitations. First, although modern convolutional neural network (CNNs) consist of many types of layers other than convolution, especially during training, these frameworks largely focus on convolution layers only. Second, these frameworks are generally targeted towards inference, and lack support for training operations. This work proposes a novel performance analysis framework, SimDIT, for general ASIC-based systolic hardware accelerator platforms. The modeling effort of SimDIT comprehensively covers convolution and non-convolution operations of both CNN inference and training on a highly parameterizable hardware substrate. SimDIT is integrated with a backend silicon implementation flow and provides detailed end-to-end performance statistics (i.e., data access cost, cycle counts, energy, and power) for executing CNN inference and training workloads. SimDIT-enabled performance analysis reveals that on a 64×64 processing array, non-convolution operations constitute 59.5% of total runtime for ResNet-50 training workload. In addition, by optimally distributing available off-chip DRAM bandwidth and on-chip SRAM resources, SimDIT achieves 18× performance improvement over a generic static resource allocation for ResNet-50 inference.

show abstract

“…However, because the conventional design flow of RTL programming is complicated and error-prone, it takes considerable design efforts to realize the customized GNN accelerator case by case for the target GNN workload or on the target FPGA devices. Though there is a lot of prior study on automating the flow of DNN accelerator development for FPGA [5,24,31], it is non-trivial to directly apply prior DNN-FPGA automation frameworks to the GNN-based applications. The reason is mainly attributed to three factors including 1) they are built on the traditional deep learning frameworks such as Caffe and Pytorch.…”

Section: Introductionmentioning

confidence: 99%