Dense Feature Aggregation and Pruning for RGBT Tracking

Zhu, Yabin; Li, Chenglong; Luo, Bin; Tang, Jin; Wang, Xiao

doi:10.1145/3343031.3350928

Cited by 164 publications

(116 citation statements)

References 25 publications

Supporting

Mentioning

116

Contrasting

Order By: Relevance

“…Along with the improvement of the above-mentioned methods, the corresponding performance of RGB-T trackers has been continuously upgraded. Representative works [2,[6][7][8][9][10][11] are based on sparse representation, correlation filtering, and deep learning. Li et al [2] proposed a cross-modal manifold sorting algorithm, which solved the influence of background clutter during the tracking process.…”

Section: Rgb-t Trackingmentioning

confidence: 99%

“…The object tracking problem of RGB-T is an extension of the traditional visual tracking task, that is, given the initial position state of the target, the RGB and thermal infrared image are comprehensively used to continuously estimate the target position in subsequent scenes. In recent years, several works have been carried out on this RGB-T tracking research, and representative approaches are roughly divided into two categories: Tracking based on traditional manual features [1][2][3][4][5][6][7] and tracking based on deep learning [8][9][10][11].The former category is mostly based on theoretical frameworks such as sparse representation [2][3][4][5], correlation filtering [6], Bayesian filtering [7], and uses hand-crafted textures or local features to construct cross-modal object appearance model and state estimation methods. The latter class builds the effective model of modeling targets from massive data by exploring the powerful feature representation capabilities of deep neural networks.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Object Tracking in RGB-T Videos Using Modal-Aware Attention Network and Competitive Learning

Hui

Zhang

et al. 2020

Sensors

106

View full text Add to dashboard Cite

Object tracking in RGB-thermal (RGB-T) videos is increasingly used in many fields due to the all-weather and all-day working capability of the dual-modality imaging system, as well as the rapid development of low-cost and miniaturized infrared camera technology. However, it is still very challenging to effectively fuse dual-modality information to build a robust RGB-T tracker. In this paper, an RGB-T object tracking algorithm based on a modal-aware attention network and competitive learning (MaCNet) is proposed, which includes a feature extraction network, modal-aware attention network, and classification network. The feature extraction network adopts the form of a two-stream network to extract features from each modality image. The modal-aware attention network integrates the original data, establishes an attention model that characterizes the importance of different feature layers, and then guides the feature fusion to enhance the information interaction between modalities. The classification network constructs a modality-egoistic loss function through three parallel binary classifiers acting on the RGB branch, the thermal infrared branch, and the fusion branch, respectively. Guided by the training strategy of competitive learning, the entire network is fine-tuned in the direction of the optimal fusion of the dual modalities. Extensive experiments on several publicly available RGB-T datasets show that our tracker has superior performance compared to other latest RGB-T and RGB tracking approaches.

show abstract

Section: Rgb-t Trackingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Object Tracking in RGB-T Videos Using Modal-Aware Attention Network and Competitive Learning

Hui

Zhang

et al. 2020

Sensors

106

View full text Add to dashboard Cite

show abstract

“…In particular, our tracker outperforms DAT, RT-MDNet and ECO with 12.0%/11%, 14.6%/11.5% and 12.1%/9.7% in PR/SR, respectively. ter than DAPNet [40] 0.9%/2.1% in PR/SR, and our algorithm runs 6 times faster. The overall promising performance of our method can be explained by the fact that FANet makes fully use of hierarchical deep features and RGBT information to well handle the challenges of significant appearance changes and adverse environmental conditions.…”

Section: B Evaluation On Gtot Datasetmentioning

confidence: 86%

“…Runtime analysis. Finally, we present the runtime of our FANet against the state-of-the-art trackers, MD-Net [16]+RGBT, MANet [49], DAPNet [40], CMR [9], SGT [7] with their tracking performance on the RGBT234 dataset in Table IV. Our implementation is on the platform of PyTorch0.41 with 2.1 GHz Intel(R) Xeon(R) CPU E5-2620 and NVIDIA GeForce GTX 2080Ti GPU, and the average tracking speed is 19 FPS.…”

Section: Analysis Of Our Networkmentioning

confidence: 99%

Quality-Aware Feature Aggregation Network for Robust RGBT Tracking

Zhu

Tang

et al. 2021

IEEE Trans. Intell. Veh.

Self Cite

125

View full text Add to dashboard Cite

This paper investigates how to perform robust visual tracking in adverse and challenging conditions using complementary visual and thermal infrared data (RGBT tracking). We propose a novel deep network architecture called qualityaware Feature Aggregation Network (FANet) for robust RGBT tracking. Unlike existing RGBT trackers, our FANet aggregates hierarchical deep features within each modality to handle the challenge of significant appearance changes caused by deformation, low illumination, background clutter and occlusion. In particular, we employ the operations of max pooling to transform these hierarchical and multi-resolution features into uniform space with the same resolution, and use 1×1 convolution operation to compress feature dimensions to achieve more effective hierarchical feature aggregation. To model the interactions between RGB and thermal modalities, we elaborately design an adaptive aggregation subnetwork to integrate features from different modalities based on their reliabilities and thus are able to alleviate noise effects introduced by low-quality sources. The whole FANet is trained in an end-to-end manner. Extensive experiments on large-scale benchmark datasets demonstrate the high-accurate performance against other state-of-the-art RGBT tracking methods.

show abstract

“…Visual tracking [4,11,17,39,63] is still one of the most active and important research areas in computer vision, which aims to predict the location of an arbitrary target in the consecutive frames precisely by a given initial location (e,g., a bounding box annotation). Although a variety of visual tracking models [6,31,56,64] have been developed, visual tracking is still an on-going and challenging task due to large variations on occlusion, obscureness, fast motions and deformation (i.e., some common challenges as shown in [51].…”

Section: Introductionmentioning

confidence: 99%

S2SiamFC

Sio

Shuai

et al. 2020

Proceedings of the 28th ACM International Conference on Multimedia

View full text Add to dashboard Cite

To exploit rich information from unlabeled data, in this work, we propose a novel self-supervised framework for visual tracking which can easily adapt the state-of-the-art supervised Siamesebased trackers into unsupervised ones by utilizing the fact that an image and any cropped region of it can form a natural pair for self-training. Besides common geometric transformation-based data augmentation and hard negative mining, we also propose adversarial masking which helps the tracker to learn other context information by adaptively blacking out salient regions of the target. The proposed approach can be trained offline using images only without any requirement of manual annotations and temporal information from multiple consecutive frames. Thus, it can be used with any kind of unlabeled data, including images and video frames. For evaluation, we take SiamFC as the base tracker and name the proposed self-supervised method as 2 SiamFC. Extensive experiments and ablation studies on the challenging VOT2016 and VOT2018 datasets are provided to demonstrate the effectiveness of the proposed method which not only achieves comparable performance to its supervised counterpart and other unsupervised methods requiring multiple frames. CCS CONCEPTS • Computing methodologies → Tracking.

show abstract

Dense Feature Aggregation and Pruning for RGBT Tracking

Cited by 164 publications

References 25 publications

Object Tracking in RGB-T Videos Using Modal-Aware Attention Network and Competitive Learning

Object Tracking in RGB-T Videos Using Modal-Aware Attention Network and Competitive Learning

Quality-Aware Feature Aggregation Network for Robust RGBT Tracking

S2SiamFC

Contact Info

Product

Resources

About