Design space exploration of FPGA-based Deep Convolutional Neural Networks

Motamedi, Mohammad; Gysel, Philipp; Akella, Venkatesh; Ghiasi, Soheil

doi:10.1109/aspdac.2016.7428073

Cited by 173 publications

(72 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The result is then added to the previously obtained partial sum. However, as the kernel sizes (Nkx and Nky) are usually relatively small, stand-alone loop unrolling within one kernel window cannot provide enough parallelism to fully utilize the accelerator compute resources [34]. Fig.…”

Section: Convolutional Layer Of a Dnnmentioning

confidence: 99%

“…These loop unrolling types can be combined to further increase the parallelism in convolutional layer processing. For example, loop unrolling within the kernel window, across multiple input feature map channels, and across different kernels are employed together in [6], [34], [35] while loop unrolling within one kernel window and within one input feature map channel are utilized in [27].…”

Section: Convolutional Layer Of a Dnnmentioning

confidence: 99%

See 1 more Smart Citation

Software-Defined Design Space Exploration for an Efficient DNN Accelerator Architecture

Che

et al. 2021

IEEE Trans. Comput.

View full text Add to dashboard Cite

Deep neural networks (DNNs) have been shown to outperform conventional machine learning algorithms across a wide range of applications, e.g., image recognition, object detection, robotics, and natural language processing. However, the high computational complexity of DNNs often necessitates extremely fast and efficient hardware. The problem gets worse as the size of neural networks grows exponentially. As a result, customized hardware accelerators have been developed to accelerate DNN processing without sacrificing model accuracy. However, previous accelerator design studies have not fully considered the characteristics of the target applications, which may lead to sub-optimal architecture designs. On the other hand, new DNN models have been developed for better accuracy, but their compatibility with the underlying hardware accelerator is often overlooked. In this article, we propose an application-driven framework for architectural design space exploration of DNN accelerators. This framework is based on a hardware analytical model of individual DNN operations. It models the accelerator design task as a multi-dimensional optimization problem. We demonstrate that it can be efficaciously used in application-driven accelerator architecture design: we use the framework to optimize the accelerator configurations for eight representative DNNs and select the configuration with the highest geometric mean performance. The geometric mean performance improvement of the selected DNN configuration relative to the architectural configuration optimized only for each individual DNN ranges from 12.0% to 117.9%. Given a target DNN, the framework can generate efficient accelerator design solutions with optimized performance and area. Furthermore, we explore the opportunity to use the framework for accelerator configuration optimization under simultaneous diverse DNN applications. The framework is also capable of improving neural network models to best fit the underlying hardware resources. We demonstrate that it can be used to analyze the relationship between the operations of the target DNNs and the corresponding accelerator configurations, based on which the DNNs can be tuned for better processing efficiency on the given accelerator without sacrificing accuracy.

show abstract

Section: Convolutional Layer Of a Dnnmentioning

confidence: 99%

Section: Convolutional Layer Of a Dnnmentioning

confidence: 99%

Software-Defined Design Space Exploration for an Efficient DNN Accelerator Architecture

Che

et al. 2021

IEEE Trans. Comput.

View full text Add to dashboard Cite

show abstract

“…For convolution layers, in which the processing is described in listing 6a, nding the optimal PE con guration can be seen as a loop optimization problem [39,9,28] [77,65,40,78,36,79,80,43]. This problem is addressed by applying loop optimization techniques such loop unrolling, loop tiling or loop interchange to the 7 nested loops of listing 6a.…”

Section: Simd Accelerators and Loop Optimizationmentioning

confidence: 99%

“…To address this optimization problem, a brute force exploration is performed, such in [39,28,77,65,40,78]. This exploration is usually driven by the Roo ine method [82] in order to select the feasible design solutions that matches with the maximum computational throughput and the maximum memory bandwidth a given platform can deliver [39,40,41]. The design space can also be explored by means of heuristic search algorithms, as proposed for instance in [35].…”

Section: Design Space Explorationmentioning

confidence: 99%

Accelerating the CNN Inference on FPGAs

Abdelouahab¹,

Pelcat²,

Berry³

2020

Deep Learning in Computer Vision

View full text Add to dashboard Cite

Convolutional Neural Networks (CNNs) are currently adopted to solve an ever greater number of problems, ranging from speech recognition to image classi cation and segmentation. The large amount of processing required by CNNs calls for dedicated and tailored hardware support methods. Moreover, CNN workloads have a streaming nature, well suited to recon gurable hardware architectures such as FPGAs.The amount and diversity of research on the subject of CNN FPGA acceleration within the last 3 years demonstrates the tremendous industrial and academic interest. This paper presents a state-of-the-art of CNN inference accelerators over FPGAs. The computational workloads, their parallelism and the involved memory accesses are analyzed. At the level of neurons, optimizations of the convolutional and fully connected layers are explained and the performances of the di erent methods compared. At the network level, approximate computing and datapath optimization methods are covered and state-of-the-art approaches compared. The methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators and will fuel the future advances on e cient hardware deep learning.

show abstract

“…The FPGA more advantages than other digital controllers in high speed operation, low power consumption, parallel processing and reconfigurable design.The convolution neural network based on FPGA is effectively used in image identification [6].The FPGA accelerator for the 3D convolution design aides to avoid the loading repetition of the processing feature maps [7]. The performance of deep convolution neural network is 1.9 to 250 times faster by utilizing FPGA device [2] [8]. Controllers analyses for nonlinear systems has been reported [9][10][11][12][13][14][15][16][17][18] The real time implementation of the convolution based on NTT algorithmis evaluated by using the FPGA devices namely Xilinx Spartan 3A DSP FPGA and Xilinx Virtex 6 FPGA.…”

Section: Introductionmentioning

confidence: 99%

Performance analysis of number theoretic transform-based convolution using field programmable gate array

Kumar¹

2018

IJET

View full text Add to dashboard Cite

This paper presents the convolution operation based on the Number Theoretic Transfom for two n=8 input sequences. The convolution of two n-point sequences using Fast Fourier Transform exhibits design complexity leading to high power consumption. The Number Theoretic Transform utilizes the matrix of modulus values to evaluate the convolution. The Number Theoretic Transform is as an integer transform which makes the design comparatively simple. The convolution based Number Theoretic Transform is developed using the Very High Speed Integrated Circuit Hardware Description language.Also the real time implementation of the proposed method is validated by the Xilinx Spartan FPGA family devices. The performance analysis of power, speed and area are evaluated and compared with 3A DSP FPGA and Virtex 6 FPGA devices.

show abstract

Design space exploration of FPGA-based Deep Convolutional Neural Networks

Cited by 173 publications

References 9 publications

Software-Defined Design Space Exploration for an Efficient DNN Accelerator Architecture

Software-Defined Design Space Exploration for an Efficient DNN Accelerator Architecture

Accelerating the CNN Inference on FPGAs

Performance analysis of number theoretic transform-based convolution using field programmable gate array

Contact Info

Product

Resources

About