AdaPT: Fast Emulation of Approximate DNN Accelerators in PyTorch

Danopoulos, Dimitrios; Zervakis, Georgios I.; Siozios, Kostas; Soudris, Dimitrios; Henkel, Jörg

doi:10.1109/tcad.2022.3212645

Cited by 19 publications

(6 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, as shown in Fig. 1 (a-d), the peak ARED of our scaleTRIM (3,4) has the least ARED among the other state-of-the-art works. Our Novel Contributions: In this paper, a novel scalable approximate multiplier that utilizes a lookup table-based compensation unit to reduce the approximation error is proposed.…”

Section: Introductionmentioning

confidence: 76%

See 1 more Smart Citation

scaleTRIM: Scalable TRuncation-Based Integer Approximate Multiplier with Linearization and Compensation

Farahmand¹,

Mahani²,

Ghavami³

et al. 2023

Preprint

View full text Add to dashboard Cite

Approximate computing (AC) has become a prominent solution to improve the performance, area, and power/energy efficiency of a digital design at the cost of output accuracy. We propose a novel scalable approximate multiplier that approximates the multiplication of two operands using fitted linear functions with two inputs, referred to as linearization. Multiplication operations can be completely replaced by addition and bit-wise shift by using linearization. Moreover, it utilizes a lookup table-based compensation unit as a novel error-reduction method. Input operands are truncated to a reduced bitwidth representation (e.g., h bits) based on their leading one positions. Then, a curve-fitting method is employed to map the product term to a linear function, and a piecewise constant error-correction term is used to reduce the approximation error. In order to compute the piecewise constant error-compensation term, we divide the function space into M segments and average the errors within each segment. The multiplier supports various degrees of truncation and error-compensation to exploit accuracy-efficiency trade-off. The proposed approximate multiplier offers better error metrics such as mean and standard deviation of absolute relative error (MARED and StdARED) compare to a state-of-theart integer approximate multiplier. The proposed approximate multiplier improves the MARED and StdARED by about 38% and 32% when its energy consumption is about equal to the stateof-the-art approximate multiplier. Moreover, the performance of the proposed approximate multiplier is evaluated in image classification applications using a Deep Neural Network (DNN). The results indicate that the degradation of DNN accuracy is negligible especially due to the compensation properties of our approximate multiplier.

show abstract

Section: Introductionmentioning

confidence: 76%

“…However, the MARED of one of our proposed multiplier configurations (e.g. scaleTRIM (3,4)) is equal to 3.73%. Moreover, as shown in Fig.…”

Section: Introductionmentioning

confidence: 90%

scaleTRIM: Scalable TRuncation-Based Integer Approximate Multiplier with Linearization and Compensation

Farahmand¹,

Mahani²,

Ghavami³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…MARLIN was run on a 32-thread Ryzen 5950X CPU with 64GB DDR4 RAM and an Nvidia Quadro RTX A5000. The GPU was used only during the initial training of the FP32 and exact INT8 NNs presented in Table VIII, while the CPU was used to simulate the approximate convolutional layers during the training, validation, and test done during the search, as AdaPT only supports CPU computation [33]. The number of threads used during the computation was set to 16 for every experiment to compare how MARLIN execution time scales with different NNs depths.…”

Section: E Discussionmentioning

confidence: 99%

“…A single runtime reconfigurable approximate unit or several multipliers with fixed approximation levels could be integrated into the framework with little or no modification. Once this high-level description of the computational unit is available, the approximate model can be implemented and tested through the AdaPT library [33]. Any NN topology built with PyTorch's convolutional and fully connected layers can be easily included in MARLIN by overloading the layer definitions with the AdaPT ones without retraining or changing the model architecture.…”

Section: Proposed Methodology a Overviewmentioning

confidence: 99%

MARLIN: A Co-Design Methodology for Approximate ReconfigurabLe Inference of Neural Networks at the Edge

Guella,

Valpreda,

Caon

et al. 2024

IEEE Trans. Circuits Syst. I

View full text Add to dashboard Cite

The optimization of neural networks (NNs) is necessary to enable their deployment on energy-constrained devices. State-of-the-art methods leverage approximate multipliers to execute NNs reducing the inference energy without heavily affecting the accuracy. However, previous works usually require a specialized hardware accelerator and are limited to fixed multipliers or reconfigurable ones with few approximation levels. This paper introduces MARLIN, a framework to deploy layerwise approximate NNs on PULP, a microcontroller with a RISC-V core. A multiplier architecture, with runtime selection of 256 approximation levels, is developed and integrated into the PULP cluster cores, enabling runtime configuration through control status register (CSR) instructions embedded within the code. The PULP toolchain is adapted to incorporate the approximation level selection within the instruction flow seamlessly. MARLIN leverages the genetic algorithm NSGA-II to search for the best configurations among thousands of approximate NNs. The framework is validated by simulating an approximate NN trained with the MNIST dataset on PULP. Moreover, MARLIN is used to optimize and approximate six ResNet models trained with the CIFAR-10 dataset. In particular, for ResNet-56, the most complex NN used in the experiments, the multiplication energy is reduced by 23.9% while retaining 99% of the accuracy of the exact model.

show abstract

“…Additionally, a design of a MAC (multiply-accumulate) unit in a systolic array is synthesized for ASIC to better illustrate the results in an AI core. The impact on the accuracy of the proposed adaptive multiplier is studied on different networks (i.e., LeNet-5, AlexNet, ResNet-18, VGG-16, DenseNet) trained on MNIST and CIFAR-10 using 8-bit INT with the help of the AdaPT framework [27]. Finally, the impact of the proposed multiplier on the reliability of DNNs is studied using the mentioned benchmarks.…”

Section: A Experimental Setupmentioning

confidence: 99%

AdAM: Adaptive Approximate Multiplier for Fault Tolerance in DNN Accelerators

Taheri,

Cherezova,

Nazari

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

AdaPT: Fast Emulation of Approximate DNN Accelerators in PyTorch

Cited by 19 publications

References 14 publications

scaleTRIM: Scalable TRuncation-Based Integer Approximate Multiplier with Linearization and Compensation

scaleTRIM: Scalable TRuncation-Based Integer Approximate Multiplier with Linearization and Compensation

MARLIN: A Co-Design Methodology for Approximate ReconfigurabLe Inference of Neural Networks at the Edge

AdAM: Adaptive Approximate Multiplier for Fault Tolerance in DNN Accelerators

Contact Info

Product

Resources

About