Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3413611
|View full text |Cite
|
Sign up to set email alerts
|

S2SiamFC

Abstract: To exploit rich information from unlabeled data, in this work, we propose a novel self-supervised framework for visual tracking which can easily adapt the state-of-the-art supervised Siamesebased trackers into unsupervised ones by utilizing the fact that an image and any cropped region of it can form a natural pair for self-training. Besides common geometric transformation-based data augmentation and hard negative mining, we also propose adversarial masking which helps the tracker to learn other context inform… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 34 publications
(4 citation statements)
references
References 58 publications
0
3
0
Order By: Relevance
“…Unsupervised Learning in 2D Tracking. To train a 2D tracker without labels, some methods construct templatesearch pairs from still frames to explore the self-supervised signals of the videos in the spatial dimension [47], [48]. However, these approaches are sensitive to appearance change, which is common in LiDAR data.…”
Section: Related Workmentioning
confidence: 99%
“…Unsupervised Learning in 2D Tracking. To train a 2D tracker without labels, some methods construct templatesearch pairs from still frames to explore the self-supervised signals of the videos in the spatial dimension [47], [48]. However, these approaches are sensitive to appearance change, which is common in LiDAR data.…”
Section: Related Workmentioning
confidence: 99%
“…In subsequent research, SiamRPN 28 incorporates a region proposal network into the tracking framework for joint classification and regression. In addition, some further studies [29][30][31] have been completed, such as the deeper backbone, feature aggregation architecture, and anchor-free methods. Transformer structure 32 has been applied to machine translation and target detection with remarkable effects.…”
Section: Aerial Trackingmentioning
confidence: 99%
“…CycleSiam [ 31 ] extends the concept of consistency learning to Siamese network structures, while JSLTC [ 32 ] models video frames by computing an inter-frame affinity matrix and uses the obtained correlations for tracking. S2SiamFC [ 33 ] focuses on spatial monitoring and utilizes static frames to create training pairs. On the other hand, PUL [ 34 ] introduces contrastive learning to build a more discriminative model and uses robust loss functions during training to explore temporal correspondence blocks.…”
Section: Related Workmentioning
confidence: 99%