Deep Convolutional Neural Networks (CNNs) are the state-of-the-art in image classification. Since CNN feed forward propagation involves highly regular parallel computation, it benefits from a significant speed-up when running on fine grain parallel programmable logic devices. As a consequence, several studies have proposed FPGA-based accelerators for CNNs. However, because of the large computational power required by CNNs, none of the previous studies has proposed a direct mapping of the CNN onto the physical resources of an FPGA, allocating each processing actor to its own hardware instance. In this paper, we demonstrate the feasibility of the so called direct hardware mapping (DHM) and discuss several tactics we explore to make DHM usable in practice. As a proof of concept, we introduce the HADDOC2 open source tool, that automatically transforms a CNN description into a synthesizable hardware description with platform-independent direct hardware mapping 1 .
The complexity of hardware systems is currently growing faster than the productivity of system designers and programmers. This phenomenon is called Design Productivity Gap and results in inflating design costs.In this paper, the notion of Design Productivity is precisely defined, as well as a metric to assess the Design Productivity of a High-Level Synthesis (HLS) method versus a manual hardware description. The proposed Design Productivity metric evaluates the trade-off between design efficiency and implementation quality. The method is generic enough to be used for comparing several HLS methods of different natures, opening opportunities for further progress in Design Productivity.To demonstrate the Design Productivity evaluation method, an HLS compiler based on the CAPH language is compared to manual VHDL writing. The causes that make VHDL lower level than CAPH are discussed. Versions of the sub-pixel interpolation filter from the MPEG HEVC standard are implemented and a design productivity gain of 2.3× in average is measured for the CAPH HLS method. It results from an average gain in design time of 4.4× and an average loss in quality of 1.9×.
Deep Neural Networks are becoming the de-facto standard models for image understanding, and more generally for computer vision tasks. As they involve highly parallelizable computations, Convolutional Neural Networks (CNNs) are well suited to current fine grain programmable logic devices. Thus, multiple CNN accelerators have been successfully implemented on FPGAs. Unfortunately, Field-Programmable Gate Array (FPGA) resources such as logic elements or Digital Signal Processing (DSP) units remain limited. This work presents a holistic method relying on approximate computing and design space exploration to optimize the DSP block utilization of a CNN implementation on FPGA. This method was tested when implementing a reconfigurable Optical Character Recognition (OCR) convolutional neural network on an Altera Stratix V device and varying both data representation and CNN topology in order to find the best combination in terms of DSP block utilization and classification accuracy. This exploration generated dataflow architectures of 76 CNN topologies with 5 different fixed point representation. Most efficient implementation performs 883 classifications/sec at 256 × 256 resolution using 8 % of the available DSP blocks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.