FPGA-GPU architecture for kernel SVM pedestrian detection

Bauer, Sebastian; Köhler, Sebastian; Doll, Konrad; Brunsmann, U.

doi:10.1109/cvprw.2010.5543772

Cited by 95 publications

(53 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…NVidia's Compute Unified Device Architecture (CUDA) has been used in [7], [32], [33] in order to speedup SVM classification using the parallel computing resources of a GPU, showing improved results compared to CPU implementations. However, GPUs are power hungry devices compared to FPGAs [22], [34], (FPGAs consume approximately an order of magnitude less power as shown in [11]) and as such they are not suitable for power-constrained embedded applications such as image object classification.…”

Section: Related Workmentioning

confidence: 99%

“…It processes around 1024 16×16 window samples, corresponding to 256-dimensional vectors, per image, without downscaling the input image which simplifies the I/O and memory accesses. The hybrid FPGA-GPU pedestrian detection system [33] for 800×600 images is able to classify around 1000 windows. The lower throughput can be attributed to the larger feature size.…”

Section: E Related Work Comparisonmentioning

confidence: 99%

See 1 more Smart Citation

Embedded Hardware-Efficient Real-Time Classification With Cascade Support Vector Machines

Kyrkou

Bouganis

Theocharides

et al. 2016

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

show abstract

Section: Related Workmentioning

confidence: 99%

Section: E Related Work Comparisonmentioning

confidence: 99%

Embedded Hardware-Efficient Real-Time Classification With Cascade Support Vector Machines

Kyrkou

Bouganis

Theocharides

et al. 2016

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

show abstract

“…A hybrid FPGA-GPU pedestrian detection is presented in [40] where the SVM is implemented on the GPU and a feature extraction algorithm on the FPGA for 800×600 images and achieves over 10 frames-per-second for the classification of 1000 windows. However, GPUs are power hungry devices compared to FPGAs [29], [41], (FPGAs consume approximately an order of magnitude less power as shown in [13]) and as such they are not suitable for power-constrained embedded applications.…”

Section: Related Workmentioning

confidence: 99%

“…Furthermore, it processes only around 1024 16×16 window samples, corresponding to 256-dimensional vectors, per image, and it does not downscale the input image which simplifies the I/O and memory accesses. The hybrid FPGA-GPU pedestrian detection system [40] for 800×600 images is able to classify around 1000 windows. The lower throughput can be attributed to the larger feature size; however, the number of processed windows is an order of magnitude less than our work.…”

Section: Related Work Comparisonmentioning

confidence: 99%

Boosting the Hardware-Efficiency of Cascade Support Vector Machines for Embedded Classification Applications

Kyrkou

Theocharides

Bouganis

et al. 2017

Int J Parallel Prog

View full text Add to dashboard Cite

Support Vector Machines (SVMs) are considered as a state-of-the-art classification algorithm capable of high accuracy rates for a different range of applications. When arranged in a cascade structure, SVMs can efficiently handle problems where the majority of data belongs to one of the two classes, such as image object classification, and hence can provide speedups over monolithic (single) SVM classifiers. However, the SVM classification process is still computationally demanding due to the number of support vectors. Consequently, in this paper we propose a hardware architecture optimized for cascaded SVM processing to boost performance and hardware efficiency, along with a hardware reduction method in order to reduce the overheads from the implementation of additional stages in the cascade, leading to significant resource and power savings. The architecture was evaluated for the application of object detection on 800×600 resolution images on a Spartan 6 Industrial Video Processing FPGA platform achieving over 30 frames-per-second. Moreover, by utilizing the proposed hardware reduction method we were able to reduce the utilization of FPGA custom-logic resources by ~30%, and simultaneously observed ~20% peak power reduction compared to a baseline implementation.

show abstract

“…Because of deeply pipelined architectures and lower power consumption, FPGA platforms often provide higher execution speed and better energy efficiency over GPUs [16]. An FPGA-GPU hybrid system was proposed in [17] using FPGA to extract HOG features and GPU to perform classification; it achieved a throughput of 10,000 detection windows per second for FPGA execution. Note that whole images (frames) were not tested.…”

Section: Background a Related Workmentioning

confidence: 99%

Evaluation and Acceleration of High-Throughput Fixed-Point Object Detection on FPGAs

Najjar

Roy-Chowdhury

2015

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

Abstract-The reliance on object or people detection is rapidly growing beyond surveillance to industrial and social applications. The Histogram of Oriented Gradients (HOG), one of the most popular object detection algorithms, achieves high detection accuracy but delivers just under one frame-per-second (fps) on a high-end CPU. FPGA accelerations of this algorithm are limited by the intensive floating-point computations. All current fixedpoint HOG implementations use large bit-width to maintain detection accuracy, or perform poorly at reduced data precision. In this paper we introduce the full-image evaluation methodology to explore the FPGA implementation of HOG using reduced bit-width. This approach lessens the required area resources on the FPGA and increases the clock frequency and hence the throughput per device through increased parallelism. We evaluate the detection accuracy of the fixed-point HOG by applying state-of-the-art computer vision pedestrian detection evaluation metrics and show it performs as well as the original floatingpoint code from OpenCV. We then show our single FPGA implementation achieves a 68.7x higher throughput than a highend CPU, 5.1x higher than a high-end GPU, and 7.8x higher than the same implementation using floating-point on the same FPGA. A power consumption comparison for different platforms shows our fixed-point FPGA implementation uses 130x less power than CPU, and 31x less energy than GPU to process one image.

show abstract

FPGA-GPU architecture for kernel SVM pedestrian detection

Cited by 95 publications

References 18 publications

Embedded Hardware-Efficient Real-Time Classification With Cascade Support Vector Machines

Embedded Hardware-Efficient Real-Time Classification With Cascade Support Vector Machines

Boosting the Hardware-Efficiency of Cascade Support Vector Machines for Embedded Classification Applications

Evaluation and Acceleration of High-Throughput Fixed-Point Object Detection on FPGAs

Contact Info

Product

Resources

About