PipeCNN: An OpenCL-based open-source FPGA accelerator for convolution neural networks

Wang, Dong; Xu, Ke; Jiang, Diankun

doi:10.1109/fpt.2017.8280160

Cited by 77 publications

(74 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The developed hardware architectures consist of C++ host controllers and multiple OpenCL kernels, which are accelerated using either an FPGA or a GPU. For x86-based systems, OpenCL accelerated kernels using FPGAs typically reside on an FPGA development board, which is connected to a separate independent host system through the PCIe express interface [2]. For ARM-based systems, the FPGA is typically connected to a Hard Processor System (HPS) on a SoC through specialized bridges, as in the case of the Intel DE1-SoC development board that we used here.…”

Section: A Software Architecturementioning

confidence: 99%

Accelerating Deterministic and Stochastic Binarized Neural Networks on FPGAs Using OpenCL

Lammie

Wang

Azghadi

2019

2019 IEEE 62nd International Midwest Symposium on Circuits and Systems (MWSCAS)

View full text Add to dashboard Cite

Recent technological advances have proliferated the available computing power, memory, and speed of modern Central Processing Units (CPUs), Graphics Processing Units (GPUs), and Field Programmable Gate Arrays (FPGAs). Consequently, the performance and complexity of Artificial Neural Networks (ANNs) is burgeoning. While GPU-accelerated Deep Neural Networks (DNNs) currently offer state-of-the-art performance, they consume large amounts of power. Training such networks on CPUs is inefficient, as data throughput and parallel computation is limited. FPGAs are considered a suitable candidate for performance critical, low power systems, e.g. the Internet of Things (IOT) edge devices. Using the Xilinx SDAccel or Intel FPGA SDK for OpenCL development environment, networks described using the high-level OpenCL framework can be accelerated on heterogeneous platforms. Moreover, the resource utilization and power consumption of DNNs can be further enhanced by utilizing regularization techniques that binarize network weights. In this paper, we introduce, to the best of our knowledge, the first FPGA-accelerated stochastically binarized DNN implementations, and compare them to implementations accelerated on both GPUs and FPGAs. All our developed networks are trained and benchmarked using the popular MNIST and CIFAR-10 datasets. For our binarized and conventional FPGAbased networks, we achieve a >16-fold improvement in power consumption, compared to their GPU-accelerated counterparts. Also, our binarized FPGA-based networks require >25% shorter inference times, compared to their GPU-based counterparts.

show abstract

Section: A Software Architecturementioning

confidence: 99%

Accelerating Deterministic and Stochastic Binarized Neural Networks on FPGAs Using OpenCL

Lammie

Wang

Azghadi

2019

2019 IEEE 62nd International Midwest Symposium on Circuits and Systems (MWSCAS)

View full text Add to dashboard Cite

show abstract

“…Our work completes their analysis. The group of Don Wang [17] developed an FPGA framework for image classification and a comparison of the CNN models AlexNet and VGG-16 [7] on both Altera and Xilinx FPGAs. In this case, the shortest classification time is achieved by Altera DE5-net and is 23 FPS for AlexNet and 1.4 FPS for VGG-16, with a power consumption of 27.3 Watt and 29.8 Watt respectively.…”

Section: Related Workmentioning

confidence: 99%

Convolutional Neural Networks on Embedded Automotive Platforms: A Qualitative Comparison

Brilli

Burgio

Bertogna

2018

2018 International Conference on High Performance Computing &Amp; Simulation (HPCS)

View full text Add to dashboard Cite

In the last decade, the rise of power-efficient, heterogeneous embedded platforms paved the way to the effective adoption of neural networks in several application domains. Especially, many-core accelerators (e.g., GPUs and FPGAs) are used to run Convolutional Neural Networks, e.g., in autonomous vehicles, and industry 4.0. At the same time, advanced research on neural networks is producing interesting results in computer vision applications, and NN packages for computer vision object detection and categorization such as YOLO, GoogleNet and AlexNet reached an unprecedented level of accuracy and performance. With this work, we aim at validating the effectiveness and efficiency of most recent networks on state-of-the-art embedded platforms, with commercial-off-the-shelf System-on-Chips such as the NVIDIA Tegra X2 and Xilinx Ultrascale+. In our vision, this work will support the choice of the most appropriate CNN package and computing system, and at the same time tries to "make some order" in the field.

show abstract

“…There are several OpenCL frameworks for DNN deployment on FPGAs. Among these available frameworks, we are using PipeCNN created by Wang et al [19], which is the only open-source one. Our next step is to deploy pretrained models to the FPGA using PipeCNN and investigate the real-time inference performance of the FPGA for different models and input image sizes.…”

Section: Future Workmentioning

confidence: 99%

“…ResNet-32 with input size 32 by 32 has only 0.46 million parameters, while AlexNet and VGG, both of which are used in the acceleration of CNNs on FPGAs by researchers like Suda et al[18] and Wang et al[19], have 60 million and 138 million parameters respectively.III. METHODOLOGYOur experiments are completed on a Ubuntu 16.04 LTS machine with an Intel i7-7700K 4.2 GHz CPU and an NVIDIA GTX 1050 Ti GPU with 16 GB memory.…”

mentioning

confidence: 99%

Benchmarking Deep Learning Frameworks with FPGA-suitable Models on a Traffic Sign Dataset

Lin

Ota

Owens

et al. 2018

2018 IEEE Intelligent Vehicles Symposium (IV)

View full text Add to dashboard Cite

We benchmark several widely used deep-learning frameworks for performing deep-learning-related automotive tasks (e.g., traffic sign recognition) that need to achieve realtime and high accuracy results with limited resources available on embedded platforms such as FPGAs. In our benchmarks, we use various input image sizes on models that are suitable for FPGA deployment, and investigate the training speed and inference accuracy of selected frameworks for these different sizes on a popular traffic sign recognition dataset. We report results by running the frameworks solely on the CPU as well as by turning on GPU acceleration. We also provide optimizations we apply to fine-tune the performance of the frameworks. We discover that Neon and MXNet deliver the best training speed and inference accuracy in general for all our test cases, while Tensorflow is always among the frameworks with the highest inference accuracies. We also observe that on the particular dataset we tested on (i.e., GTSRB), the image size of the region of interest does not necessarily affect the inference accuracy, and that using deep models, e.g., ResNet-32, which have longer training times, might not provide improvements to inference accuracy.

show abstract

PipeCNN: An OpenCL-based open-source FPGA accelerator for convolution neural networks

Cited by 77 publications

References 7 publications

Accelerating Deterministic and Stochastic Binarized Neural Networks on FPGAs Using OpenCL

Accelerating Deterministic and Stochastic Binarized Neural Networks on FPGAs Using OpenCL

Convolutional Neural Networks on Embedded Automotive Platforms: A Qualitative Comparison

Benchmarking Deep Learning Frameworks with FPGA-suitable Models on a Traffic Sign Dataset

Contact Info

Product

Resources

About