Power-Efficient Deep Convolutional Neural Network Design Through Zero-Gating PEs and Partial-Sum Reuse Centric Dataflow

Lin, Yandan; Ye, Jinghao; Yanagisawa, Masao; Shi, Youhua

doi:10.1109/access.2021.3053259

Cited by 5 publications

(4 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We kept the PEs as simple as possible to maximize area efficiency. A zero detection and gating circuit, inspired by works like [7], [15] is included before the multiplier in order to avoid unnecessary switching when any input value is zero. Even though our work focuses on dense CNNs rather than sparse networks, this technique is inexpensive and can save power even at low sparsity levels (see Section III).…”

Section: A Systolic Array Gemm Enginementioning

confidence: 99%

An Energy-Efficient GeMM-Based Convolution Accelerator With On-the-Fly im2col

Fornt

Fontova-Musté

Caro³

et al. 2023

IEEE Trans. VLSI Syst.

View full text Add to dashboard Cite

Systolic array architectures have recently emerged as successful accelerators for deep convolutional neural network (CNN) inference. Such architectures can be used to efficiently execute general matrix-matrix multiplications (GeMM), but computing convolutions with this primitive involves transforming the 3D input tensor into an equivalent matrix, which can lead to an inflation of the input data, increasing the off-chip memory traffic which is critical for energy efficiency. In this work, we propose a GeMM-based systolic array accelerator that uses a novel data feeder architecture to perform on-chip, on-the-fly convolution lowering (also known as im2col), supporting arbitrary tensor and kernel sizes as well as strided and dilated (or atrous) convolutions. By using our data feeder, we reduce memory transactions and required bandwidth on state-of-the-art CNNs by a factor of two, while only adding an area and power overhead of 4% and 7% respectively. An ASIC implementation of our accelerator in 22 nm technology fits in less than 1.1 mm 2 and reaches an energy efficiency of 1.10 TFLOP/sW with 16-bit floating point arithmetic.

show abstract

Section: A Systolic Array Gemm Enginementioning

confidence: 99%

An Energy-Efficient GeMM-Based Convolution Accelerator With On-the-Fly im2col

Fornt

Fontova-Musté

Caro³

et al. 2023

IEEE Trans. VLSI Syst.

View full text Add to dashboard Cite

show abstract

“…DNN accelerators have been developed with various design approaches [7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26]. Due to the data-centric property in recent ASIC-based DNN accelerators, in which a significantly large amount of data should be processed and transferred in and out of the accelerator chips, memory plays an important role.…”

Section: Dnn Acceleratorsmentioning

confidence: 99%

“…Due to the data-centric property in recent ASIC-based DNN accelerators, in which a significantly large amount of data should be processed and transferred in and out of the accelerator chips, memory plays an important role. The typical on-chip global memory architectures can be simply classified into two types, i.e., those which use a unified buffer, such as those in [13,16,26], and those which use separate buffers for input feature maps, filter weights, and partial sums, such as those in [15,17]. Using a multi-bank-based unified global buffer can flexibly change the volume of the on-chip ifmaps, weights, and psums in different layers, while using separated buffers can transact different types of data in parallel.…”

Section: Dnn Acceleratorsmentioning

confidence: 99%

“…In the literature, various techniques, such as pruning, compression, data reuse methods, etc., have been developed to reduce off-chip memory accesses for energy-efficient DNN processing. Among them, one of the most promising approaches is to leverage on-chip data reusability, such as with input feature map reuse (ir) [11], partial sum reuse (pr) [13][14][15], and weight reuse (wr) [16,17]. These approaches have shown their advantages; however, they all consider each layer separately.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Dataflow Optimization through Exploring Single-Layer and Inter-Layer Data Reuse in Memory-Constrained Accelerators

Ye¹,

Yanagisawa

Shi

2022

Electronics

Self Cite

View full text Add to dashboard Cite

Off-chip memory access has become the performance and energy bottleneck in memory-constrained neural network accelerators. To provide a solution for the energy efficient processing of various neural network models, this paper proposes a dataflow optimization method for modern neural networks by exploring the opportunity of single-layer and inter-layer data reuse to minimize the amount of off-chip memory access in memory-constrained accelerators. A mathematical analysis of three inter-layer data reuse methods is first presented. Then, a comprehensive exploration to determine the optimal data reuse strategy from single-layer and inter-layer data reuse approaches is proposed. The result shows that when compared to the existing single-layer-based exploration method, SmartShuttle, the proposed approach can achieve up to 20.5% and 32.5% of off-chip memory access reduction for ResNeXt-50 and DenseNet-121, respectively.

show abstract

Deep learning based object detection for resource constrained devices: Systematic review, future trends and challenges ahead

Kamath

Renuka

2023

Neurocomputing

View full text Add to dashboard Cite

Power-Efficient Deep Convolutional Neural Network Design Through Zero-Gating PEs and Partial-Sum Reuse Centric Dataflow

Cited by 5 publications

References 22 publications

An Energy-Efficient GeMM-Based Convolution Accelerator With On-the-Fly im2col

An Energy-Efficient GeMM-Based Convolution Accelerator With On-the-Fly im2col

Dataflow Optimization through Exploring Single-Layer and Inter-Layer Data Reuse in Memory-Constrained Accelerators

Deep learning based object detection for resource constrained devices: Systematic review, future trends and challenges ahead

Contact Info

Product

Resources

About