2020
DOI: 10.3390/s20144021
|View full text |Cite
|
Sign up to set email alerts
|

Learning Soft Mask Based Feature Fusion with Channel and Spatial Attention for Robust Visual Object Tracking

Abstract: We propose to improve the visual object tracking by introducing a soft mask based low-level feature fusion technique. The proposed technique is further strengthened by integrating channel and spatial attention mechanisms. The proposed approach is integrated within a Siamese framework to demonstrate its effectiveness for visual object tracking. The proposed soft mask is used to give more importance to the target regions as compared to the other regions to enable effective target feature representation and to in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 11 publications
(4 citation statements)
references
References 95 publications
0
4
0
Order By: Relevance
“…The outputs of the and layers consist of and respectively. The one-stage region proposal network is illustrated in detail in [ 20 ]. It can be employed to obtain the proposals for a visual tracker.…”
Section: Proposed Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The outputs of the and layers consist of and respectively. The one-stage region proposal network is illustrated in detail in [ 20 ]. It can be employed to obtain the proposals for a visual tracker.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…Song et al [ 19 ] performed different kinds of adversarial networks to generate variable samples, which helped to identify richer representation for tracking. Fiaz et al [ 20 ] proposed a soft mask feature fusion mechanism, which can be easily integrated into the conventional Siamese tracking framework to enhance the discriminative capability when distinguish the target from the background. Gordon et al [ 21 ] introduced the real-time recurrent regression networks to combine the multiple appearance features and motion information together, then perform the spatial-temporal fusion to accomplish a tracking network that increases the precision of the tracking results.…”
Section: Related Workmentioning
confidence: 99%
“…In the field of image classification, such as end‐to‐end object detection with transformers [16], the modified transformer structure has secured good performance. In the field of target tracking, such as transformer tracking [4] and transformer meets tracker [17–19], the structures of the encoder and decoder have been modified and innovated according to the characteristics of target tracking task, and the algorithms have reached the most advanced level.…”
Section: Related Workmentioning
confidence: 99%
“…Table 2 shows the compared trackers and their corresponding accuracy, robustness, EAO score, and tracking speed. We consider GradNet [71], SCS-Siam [72], MemTrack [14], ECOhc [6], SATIN [73], DSiam [11], CSRDCF [74], SiamFC [10], DCFNet [75], DensSiam [76], DSST [66], and SRDCF [61] trackers for the comparison. The top three performed trackers in terms of VOT measure metrics are highlighted using red, green, and blue colors, respectively.…”
Section: Experiments On Vot2017 and Vot2018 Benchmarkmentioning
confidence: 99%