Fully Convolutional Single-Crop Siamese Networks for Real-Time Visual Object Tracking

Lee, Dong-Hyun

doi:10.3390/electronics8101084

Cited by 6 publications

(4 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[6] further improves tracking speed by mapping feature calculations into Fourier space. With the development of deep learning, the discrimination ability of deep features has been improved, [15,16] takes advantage of deep features to track objects. To improve the accuracy of the model, [17,18] combines the deep features of different layers with semantic and spatial information while [7] combines hand-craft and deep features to enhance the discriminative ability of the model.…”

Section: Correlation Filter-based Methodsmentioning

confidence: 99%

A Learning Frequency-Aware Feature Siamese Network for Real-Time Visual Tracking

et al. 2020

View full text Add to dashboard Cite

Visual object tracking by Siamese networks has achieved favorable performance in accuracy and speed. However, the features used in Siamese networks have spatially redundant information, which increases computation and limits the discriminative ability of Siamese networks. Addressing this issue, we present a novel frequency-aware feature (FAF) method for robust visual object tracking in complex scenes. Unlike previous works, which select features from different channels or layers, the proposed method factorizes the feature map into multi-frequency and reduces the low-frequency information that is spatially redundant. By reducing the low-frequency map’s resolution, the computation is saved and the receptive field of the layer is also increased to obtain more discriminative information. To further improve the performance of the FAF, we design an innovative data-independent augmentation for object tracking to improve the discriminative ability of tracker, which enhanced linear representation among training samples by convex combinations of the images and tags. Finally, a joint judgment strategy is proposed to adjust the bounding box result that combines intersection-over-union (IoU) and classification scores to improve tracking accuracy. Extensive experiments on 5 challenging benchmarks demonstrate that our FAF method performs favorably against SOTA tracking methods while running around 45 frames per second.

show abstract

Section: Correlation Filter-based Methodsmentioning

confidence: 99%

A Learning Frequency-Aware Feature Siamese Network for Real-Time Visual Tracking

et al. 2020

View full text Add to dashboard Cite

show abstract

“…The two-dimensional max-pooling engine is implemented by employing the multiple one-dimensional rank-tracking-based max-pooling engines, as depicted in Figure 6. The block marked with "M H " represents the horizontal one-dimensional max-pooling engine shown in Equation (2). Specifically, y p (i,j) is obtained from the highest-ranking value r 0 of the ranking-counting block illustrated in Figure 2.…”

Section: Multiplexer Switch (Ms) Multiplexermentioning

confidence: 99%

“…Convolutional neural networks (CNNs) have demonstrated remarkable performance in various domains, including image classification, object detection, and speech recognition [1,2]. However, effectively integrating CNNs into embedded systems with limited power and size requirements remains a significant challenge.…”

Section: Introductionmentioning

confidence: 99%

Efficient Two-Stage Max-Pooling Engines for an FPGA-Based Convolutional Neural Network

Hong,

Choi,

Joo

2023

Electronics

View full text Add to dashboard Cite

This paper proposes two max-pooling engines, named the RTB-MAXP engine and the CMB-MAXP engine, with a scalable window size parameter for FPGA-based convolutional neural network (CNN) implementation. The max-pooling operation for the CNN can be decomposed into two stages, i.e., a horizontal axis max-pooling operation and a vertical axis max-pooling operation. These two one-dimensional max-pooling operations are performed by tracking the rank of the values within the window in the RTB-MAXP engine and cascading the maximum operations of the values in the CMB-MAXP engine. Both the RTB-MAXP engine and the CMB-MAXP engine were implemented using VHSIC hardware description language (VHDL) and verified by simulations. The implementation results demonstrate that the 16 CMB-MAXP engines achieved a remarkable throughput of about 9 GBPS (gigabytes per second) while utilizing only about 3% of the available resources on the Xilinx Virtex UltraScale+ FPGA XCVU9P. On the other hand, the 16 RTB-MAXP engines exhibited somewhat lower throughput and resource utilization, although they did offer a slightly better latency when compared to the CMB-MAXP engines. In the comparison with existing techniques, the CMB-MAXP engine exhibited comparable implementation results in terms of the resource utilization and maximum operating frequency. It is crucial to note that only the proposed engines provide the features of runtime window scalability and boundary padding capability, which are essential requirements for CNN accelerators. The proposed max-pooling engines were employed and tested in our CNN accelerator targeting the CNN model YOLOv4-CSP-S-Leaky for object detection.

show abstract

“…Subsequently, several novel convolution kernel design methods have been proposed in quick succession. Howard et al [37] proposed Depthwise Separable Convolution, which integrates traditional convolution into two steps, namely depthwise convolution and pointwise convolution, which greatly improves the calculation efficiency [38][39][40]. Zhang et al [41] introduced group convolution, then followed by an operation of channel shuffling.…”

Section: Pedestrian Detection With Cnnsmentioning

confidence: 99%

A Parallel Convolutional Neural Network for Pedestrian Detection

Zhu

2020

Electronics

View full text Add to dashboard Cite

Pedestrian detection is a crucial task in many vision-based applications, such as video surveillance, human activity analysis and autonomous driving. Recently, most of the existing pedestrian detection frameworks only focus on the detection accuracy or model parameters. However, how to balance the detection accuracy and model parameters, is still an open problem for the practical application of pedestrian detection. In this paper, we propose a parallel, lightweight framework for pedestrian detection, named ParallelNet. ParallelNet consists of four branches, each of them learns different high-level semantic features. We fused them into one feature map as the final feature representation. Subsequently, the Fire module, which includes Squeeze and Expand parts, is employed for reducing the model parameters. Here, we replace some convolution modules in the backbone with Fire modules. Finally, the focal loss is led into the ParallelNet for end-to-end training. Experimental results on the Caltech–Zhang dataset and KITTI dataset show that: Compared with the single-branch network, such as ResNet and SqueezeNet, ParallelNet has improved detection accuracy with fewer model parameters and lower Giga Floating Point Operations (GFLOPs).

show abstract

Fully Convolutional Single-Crop Siamese Networks for Real-Time Visual Object Tracking

Cited by 6 publications

References 33 publications

A Learning Frequency-Aware Feature Siamese Network for Real-Time Visual Tracking

A Learning Frequency-Aware Feature Siamese Network for Real-Time Visual Tracking

Efficient Two-Stage Max-Pooling Engines for an FPGA-Based Convolutional Neural Network

A Parallel Convolutional Neural Network for Pedestrian Detection

Contact Info

Product

Resources

About