RAPIDNN: In-Memory Deep Neural Network Acceleration Framework

Imani, Mohsen; Samragh, Mohammad; Kim, Yeseong; Gupta, Saransh; Koushanfar, Farinaz; Rosing, Tajana

doi:10.48550/arxiv.1806.05794

Cited by 4 publications

(5 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…& Baseline / 16x CAP sharing architectures. 4-bit weight/activation quantization results in negligible decrease in functional performance (and actually better performance for ResNet) [51,52]. [18], the chip footprint of the 3D a-Cortex is ~16 / ~7 times smaller, while its energy efficiency is lower only a factor of ~5.4 / ~ 5.…”

Section: Comparison With Prior Workmentioning

confidence: 99%

“…On the other hand, on the system level quite a few efforts were recently made to exploit the efficiency of MS operators to develop better DNN/RNN processor architectures [46][47][48][49][50][51][52]. For example, the ISAAC [46] and PUMA [47] architectures are 2D mesh structures of tiles where each tile contains several small (typically 128×128) ReRAM-based VMM units with their I/O peripheries.…”

Section: Comparison With Prior Workmentioning

confidence: 99%

See 1 more Smart Citation

3D-aCortex: an ultra-compact energy-efficient neurocomputing platform based on commercial 3D-NAND flash memories

Bavandpour

Sahay

Mahmoodi

et al. 2021

Neuromorph. Comput. Eng.

View full text Add to dashboard Cite

We first propose an ultra-compact energy-efficient time-domain vector-by-matrix multiplier (VMM) based on commercial 3D-NAND flash memory structure. The proposed 3D-VMM uses a novel resistive successive integrate and re-scaling (RSIR) scheme to eliminate the stringent requirement of a bulky load capacitor which otherwise dominates the area- and energy-landscape of the conventional time-domain VMMs. Our rigorous analysis, performed at the 55 nm technology node, shows that RSIR-3D-VMM achieves a record-breaking area efficiency of ∼0.02 μm2/Byte and the energy efficiency of ∼6 f J/Op for a 500 × 500 4-bit VMM, representing 5× and 1.3× improvements over the previously reported 3D-VMM approach. Moreover, unlike the previous approach, the proposed VMM can be efficiently tailored to work in a smaller current output range. Our second major contribution is the development of 3D-aCortex, a multi-purpose neuromorphic inference processor that utilizes the proposed 3D-VMM block as its core processing unit. Rigorous performance modeling of the 3D-aCortex targeting several state-of-the-art neural network benchmarks has shown that it may provide a record-breaking 30.7 MB mm−2 storage efficiency, 113.3 TOp/J peak energy efficiency, and 10.66 TOp/s computational throughput. The system-level analysis indicates that the gain in the area-efficiency of RSIR leads to a smaller data transfer delay, which compensates for the reduction in the VMM throughput due to an increased input time window.

show abstract

Section: Comparison With Prior Workmentioning

confidence: 99%

Section: Comparison With Prior Workmentioning

confidence: 99%

3D-aCortex: an ultra-compact energy-efficient neurocomputing platform based on commercial 3D-NAND flash memories

Bavandpour

Sahay

Mahmoodi

et al. 2021

Neuromorph. Comput. Eng.

View full text Add to dashboard Cite

show abstract

“…Several ReRAM-based in-situ mixed-signal DNN accelerators such as ISAAC [18], Newton [19], PipeLayer [20], PRIME [17], PUMA [21], MultiScale [22], XNOR-RRAM [37], RapidDNN [38], have been proposed in recent years. These designs utilize a combination of analog and digital units to speed up the computation.…”

Section: A Reram Crossbar and Reram-based Dnn Accelerationmentioning

confidence: 99%

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator

Yuan

Behnam

et al. 2021

Preprint

View full text Add to dashboard Cite

Recent work demonstrated the promise of using resistive random access memory (ReRAM) as an emerging technology to perform inherently parallel analog domain in-situ matrix-vector multiplication-the intensive and key computation in deep neural networks (DNNs). One key problem is the weights that are signed values. However, in a ReRAM crossbar, weights are stored as conductance of the crossbar cells, and the in-situ computation assumes all cells on each crossbar column are of the same sign. The current architectures either use two ReRAM crossbars for positive and negative weights (PRIME), or add an offset to weights so that all values become positive (ISAAC). Neither solution is ideal: they either double the cost of crossbars, or incur extra offset circuity. To better address this problem, we propose FORMS, a fine-grained ReRAM-based DNN accelerator with algorithm/hardware co-design. Instead of trying to represent the positive/negative weights, our key design principle is to enforce exactly what is assumed in the in-situ computationensuring that all weights in the same column of a crossbar have the same sign. It naturally avoids the cost of an additional crossbar. Such polarized weights can be nicely generated using alternating direction method of multipliers (ADMM) regularized optimization during the DNN training, which can exactly enforce certain patterns in DNN weights. To achieve high accuracy, we divide the crossbar into logical sub-arrays and only enforce this property within the fine-grained sub-array columns. Crucially, the small sub-arrays provides a unique opportunity for input zeroskipping, which can significantly avoid unnecessary computations and reduce computation time. At the same time, it also makes the hardware much easier to implement and is less susceptible to nonidealities and noise than coarse-grained architectures. Putting all together, with the same optimized DNN models, FORMS achieves 1.50× and 1.93× throughput improvement in terms of GOP s s×mm 2 and GOP s W compared to ISAAC, and 1.12× ∼ 2.4× speed up in terms of frame per second over optimized ISAAC with almost the same power/area cost. Interestingly, FORMS optimization framework can even speed up the original ISAAC from 10.7× up to 377.9×, reflecting the importance of software/hardware co-design optimizations.

show abstract

“…For example, heavy edge analytics in battery-driven self-driving cars, where safety and energy are critical considerations, can lead to their unexpected energy outage. For enhancing the energy efficiency of such devices, many energy-aware systolic array-based DNN accelerators have been recently developed [2] but their large size requirement for fast data processing poses meager energy gains in energy-constrained devices [3]. This problem can be addressed with approximate computing that trades the accuracy of an application-specific system, by exploiting its intrinsic error resilience, for energy savings [4].…”

Section: Introductionmentioning

confidence: 99%

Exploring Fault-Energy Trade-offs in Approximate DNN Hardware Accelerators

Siddique

Basu

Hoque

2021

2021 22nd International Symposium on Quality Electronic Design (ISQED)

View full text Add to dashboard Cite

Systolic array-based deep neural network (DNN) accelerators have recently gained prominence for their low computational cost. However, their high energy consumption poses a bottleneck to their deployment in energy-constrained devices. To address this problem, approximate computing can be employed at the cost of some tolerable accuracy loss. However, such small accuracy variations may increase the sensitivity of DNNs towards undesired subtle disturbances, such as permanent faults. The impact of permanent faults in accurate DNNs has been thoroughly investigated in the literature. Conversely, the impact of permanent faults in approximate DNN accelerators (AxDNNs) is yet under-explored. The impact of such faults may vary with the fault bit positions, activation functions and approximation errors in AxDNN layers. Such dynamacity poses a considerable challenge to exploring the trade-off between their energy efficiency and fault resilience in AxDNNs. Towards this, we present an extensive layer-wise and bit-wise fault resilience and energy analysis of different AxDNNs, using the state-of-the-art Evoapprox8b signed multipliers. In particular, we vary the stuck-at-0, stuck-at-1 fault-bit positions, and activation functions to study their impact using the most widely used MNIST and Fashion-MNIST datasets. Our quantitative analysis shows that the permanent faults exacerbate the accuracy loss in AxDNNs when compared to the accurate DNN accelerators. For instance, a permanent fault in AxDNNs can lead up to 66% accuracy loss, whereas the same faulty bit can lead to only 9% accuracy loss in an accurate DNN accelerator. Our results demonstrate that the fault resilience in AxDNNs is orthogonal to the energy efficiency.

show abstract

RAPIDNN: In-Memory Deep Neural Network Acceleration Framework

Cited by 4 publications

References 43 publications

3D-aCortex: an ultra-compact energy-efficient neurocomputing platform based on commercial 3D-NAND flash memories

3D-aCortex: an ultra-compact energy-efficient neurocomputing platform based on commercial 3D-NAND flash memories

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator

Exploring Fault-Energy Trade-offs in Approximate DNN Hardware Accelerators

Contact Info

Product

Resources

About