CNP: An FPGA-based processor for Convolutional Networks

Farabet, Clément; Poulet, Cyril; Han, Jefferson Y.; LeCun, Yann

doi:10.1109/fpl.2009.5272559

Cited by 305 publications

(157 citation statements)

References 10 publications

Supporting

Mentioning

149

Contrasting

Unclassified

Order By: Relevance

“…그러나 반복적인 학습이 진행 될 경우에는 많은 데이터 연산처리로 인한 시스템 자원의 부족으로 학습 시간이 오래 걸리는 단점이 있다 [3]. 이를 극복하기 위해 합성곱(convolution) 처리를 위한 특별한 프로세서 연구개발이 진행 중이다 [4]. 그 가운데 구글의 TPU(Tensor Processing Unit)에서는 특정 기능만을 수행하여 연 산 속도를 개선시킨 것에 대한 연구결과를 발표했다 [5] [6].…”

Section: 서론 기계학습 분야에서 Cnn 알고리즘은 이미지 인식 및 분류에 있어서 높은 인식률을 자랑한다unclassified

A design of the ALU for Convolution Neural Network of operation processing

Nam¹

2017

AJMAHS

View full text Add to dashboard Cite

The CNN algorithm exhibits excellent performance in image recognition but requires a large amount of computation processing and requires a lot of learning time each time data learning is accumulated. To solve such problems, recently various IPU and TPU have been develop to accelerate the neural network operation which is several times to several tens times faster than conventional CPU and GPU. In this paper, we propose an ALU for efficient multiplication and addition of CNN. ALU design was implemented on the Xilinx VC-707 FPGA board using Verilog HDL. Twenty five 8bit modified booth multipliers were designed with a square matrix structure and processed 200 bits per clock. In order to improve the computation speed, the arithmetic unit performs parallel processing using pipelining. Experiments were performed to verify the performance of the GPU and proposed structure MNIST 's numerical image database by comparing and measuring the computation time of the composite neural network processing.

show abstract

Section: 서론 기계학습 분야에서 Cnn 알고리즘은 이미지 인식 및 분류에 있어서 높은 인식률을 자랑한다unclassified

A design of the ALU for Convolution Neural Network of operation processing

Nam¹

2017

AJMAHS

View full text Add to dashboard Cite

show abstract

“…The proposed design consists of a systolic 2D array of programmable processing tiles which operates under the control of a CPU. The original work [4] achieved an average throughput of around 4 GOp/s at 15W on a Xilinx Spartan-3A DSP 3400 FPGA. An improved version of this architecture was presented in [5], named NeuFlow.…”

Section: Related Workmentioning

confidence: 99%

“…A common element of all these works is the assumption that the training phase has been performed offline by software and hence they concentrate on the classification task, similarly to fpgaConvNet. One of the earliest works is the one which started under the name CNP [4]. The proposed design consists of a systolic 2D array of programmable processing tiles which operates under the control of a CPU.…”

Section: Related Workmentioning

confidence: 99%

fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs

Venieris

Bouganis

2016

2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

222

103

View full text Add to dashboard Cite

Abstract-Convolutional Neural Networks (ConvNets) are a powerful Deep Learning model, providing state-of-the-art accuracy to many emerging classification problems. However, ConvNet classification is a computationally heavy task, suffering from rapid complexity scaling. This paper presents fpgaConvNet, a novel domain-specific modelling framework together with an automated design methodology for the mapping of ConvNets onto reconfigurable FPGA-based platforms. By interpreting ConvNet classification as a streaming application, the proposed framework employs the Synchronous Dataflow (SDF) model of computation as its basis and proposes a set of transformations on the SDF graph that explore the performance-resource design space, while taking into account platform-specific resource constraints. A comparison with existing ConvNet FPGA works shows that the proposed fully-automated methodology yields hardware designs that improve the performance density by up to 1.62× and reach up to 90.75% of the raw performance of architectures that are hand-tuned for particular ConvNets.

show abstract

“…Tanomoto et al implemented EMAX [10], a CNN on the CGRA with multiple local memory banks; the main difference from our work is the data stationary. FPGA implementations of CNN coprocessors using DSP blocks were also proposed [11] [12]. LUT-based FPGA implementation [13] was proposed; this transforms the weight values using a mathematical techniques to replace the multipliers with the LUT and the adders.…”

Section: Related Workmentioning

confidence: 99%

A Multithreaded CGRA for Convolutional Neural Network Processing

Ando¹,

Takamaeda-Yamazaki²,

Ikebe³

et al. 2017

View full text Add to dashboard Cite

Convolutional neural network (CNN) is an essential model to achieve high accuracy in various machine learning applications, such as image recognition and natural language processing. One of the important issues for CNN acceleration with high energy efficiency and processing performance is efficient data reuse by exploiting the inherent data locality. In this paper, we propose a novel CGRA (Coarse Grained Reconfigurable Array) architecture with timedomain multithreading for exploiting input data locality. The multithreading on each processing element enables the input data reusing through multiple computation periods. This paper presents the accelerator design performance analysis of the proposed architecture. We examine the structure of memory subsystems, as well as the architecture of the computing array, to supply required data with minimal performance overhead. We explore efficient architecture design alternatives based on the characteristics of modern CNN configurations. The evaluation results show that the available bandwidth of the external memory can be utilized efficiently when the output plane is wider (in earlier layers of many CNNs) while the input data locality can be utilized maximally when the number of output channel is larger (in later layers).

show abstract

CNP: An FPGA-based processor for Convolutional Networks

Cited by 305 publications

References 10 publications

A design of the ALU for Convolution Neural Network of operation processing

A design of the ALU for Convolution Neural Network of operation processing

fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs

A Multithreaded CGRA for Convolutional Neural Network Processing

Contact Info

Product

Resources

About