Abstract. Field-programmable gate arrays (FPGA) technology can offer significantly higher performance at much lower power consumption than is available from single and multicore CPUs and GPUs (graphics processing unit) in many computational problems. Unfortunately, the pure programming for FPGA using hardware description languages (HDL), like VHDL or Verilog, is a difficult and not-trivial task and is not intuitive for C/C++/Java programmers. To bring the gap between programming effectiveness and difficulty, the high level synthesis (HLS) approach is promoted by main FPGA vendors. Nowadays, time-intensive calculations are mainly performed on GPU/CPU architectures, but can also be successfully performed using HLS approach. In the paper we implement a bandwidth selection algorithm for kernel density estimation (KDE) using HLS and show techniques which were used to optimize the final FPGA implementation. We are also going to show that FPGA speedups, comparing to highly optimized CPU and GPU implementations, are quite substantial. Moreover, power consumption for FPGA devices is usually much less than typical power consumption of the present CPUs and GPUs.Key words: FPGA, high level synthesis, kernel density estimation, bandwidth selection, plug-in selector.FPGA-based bandwidth selection for kernel density estimation using high level synthesis approach In the paper we are concerned with FPGA approach. In [10] the author considers a problem how to use FPGA for fast computing of PDFs using direct very high speed integrated circuits hardware description language (VHDL) programming approach. However, the problem we are concerning is of different nature, as we concentrate our attention for computing the optimal bandwidth for PDF (see Section 2).To develop the final FPGA design we use the high level synthesis (HLS) approach [8,16], in which no direct hardware description language (HDL) coding is needed (typically, VHDL or Verilog languages a are used). The remainder of the paper is organized as follows. In Section 2, we turn our attention to give the reader some preliminary information on KDE and bandwidth selection. In Section 3 we provide detailed mathematical formulas for calculating optimal bandwidth using the PLUGIN method. In Section 4 we cover all the necessary details on our FPGA-based implementation. We also present practical experiments carried out and discuss the results. In Section 5, we conclude the paper.
Kernel density estimation and bandwidth selectionThe univariate kernel density estimator fˆ for a random sample X i (i = 1, 2, …, n), drawn from a common and usually unknown density function f is given by a It is worth to note that OpenCL framework, which is commonly used by GPU programmers, also becomes available for FPGA devices. Nowadays, OpenCL is offered by Altera SDK for OpenCL to easily implement OpenCL applications for FPGA. Recently, Xilinx announced a similar solution, namely SDAccel Development Environment for OpenCL, C, and C++.