A 135-frames/s 1080p 87.5-mW Binary-Descriptor-Based Image Feature Extraction Accelerator

Zhu, Wenping; Liu, Leibo; Jiang, Guangli; Yin, Shouyi; Wei, Shaojun

doi:10.1109/tcsvt.2015.2469116

Cited by 10 publications

(3 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Wenping et. al have designed a binarydescriptor-based image feature extraction accelerator [17]. The proposed accelerator is implemented in the Verilog hardware description language and verified on a FPGA platform.…”

Section: Related Workmentioning

confidence: 99%

“…There are many algorithms used in feature extraction with different characteristics and performances. Many algorithms such as scale-invariant feature transform (SIFT) , speeded up robust features (SURF) and histogramof-gradients-based (HoG) are computationally expensive and they require huge memory access and storage [11] [16] [17]. The binary descriptors are more computationally efficient and requires less memory [4] [13].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Untitled

2021

International Journal of Advanced Research in Computer and Comm

View full text Add to dashboard Cite

Visual feature extraction is widely used in many of computer vision algorithms such as object detection, stereo matching, image matching, and Visual SLAM. Real-time implementation of visual feature extraction suffers from long latency, and heavy computation. GPUs are commonly used to accelerate computationally intensive applications, but they are power hungry devices and, they have a fixed level of parallelism. To achieve processing optimization for such computationally intensive applications, FPGA, and ASIC are more efficient. In this work, an efficient and optimized FPGA architecture is designed to accelerate the computation of visual feature extraction. FAST and BRIEF algorithms are used in this system because they are simple and efficient when working with mobile and embedded systems. The system uses different frequency clocks for input pixels streaming and for processing to prevent stall cycles and achieve high speed. Multi pixels are processed per single clock cycle using optimized parallel and pipelined architecture. The proposed architecture is implemented on TERASIC DE2-115 FPGA board and tested with different image sizes and achieved very high throughput. The system can detect features up to 1000 frames per second for grayscale images with the size of 480 ×480 pixels.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Untitled

2021

International Journal of Advanced Research in Computer and Comm

View full text Add to dashboard Cite

show abstract

“…This leads to over 17% reduction of the external memory bandwidth. Also, rather than using centroid-based orientation that leads to high computational complexity, the approach in [20] is adopted, which relies on the comparison of adjacent pixels on the Bresenham circle to determine the orientation. Their work mainly focused on the architecture for FAST and Non-Maximal Suppression (NMS).…”

Section: A Related Workmentioning

confidence: 99%

High-Throughput and Area-Optimized Architecture for rBRIEF Feature Extraction

Pham

Tran

Lam

2019

IEEE Trans. VLSI Syst.

View full text Add to dashboard Cite

Feature matching is a fundamental step in many real-time computer vision applications such as Simultaneous Localization And Mapping (SLAM), motion analysis and stereo correspondence. The performance of these applications depends on the distinctiveness of the visual feature descriptors used, and the speed at which they can be extracted from video frames. When combined with standard key-point detectors, the rotation-aware Binary Robust Independent Elementary Features (rBRIEF) descriptor has been shown to outperform its counterparts. In this paper, we present a deep-pipelined stream processing architecture that is capable of extracting rBRIEF features from high-throughput video frames. To achieve high processing rate and low complexity hardware, the proposed architecture incorporates an enhanced moving summation strategy to calculate the key-points' patch moments and employs approximate computations to achieve patch rotation. Multiplier-less circuitry is introduced throughout the architecture to avoid the use of costly multipliers. Implementation on the Altera Aria V device demonstrates that the proposed architecture leads to 53.3% reduction in hardware resources (adaptive logic modules) while achieving 50% higher accuracy (in terms of average Hamming distance) when compared to the state-of-the-art architecture. In addition, the proposed architecture is able to process highresolution (1920×1080) images at 60 fps while consuming only 456.15 mW power.

show abstract