2022
DOI: 10.1007/978-3-031-20047-2_22
|View full text |Cite
|
Sign up to set email alerts
|

Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 119 publications
(40 citation statements)
references
References 45 publications
0
26
0
Order By: Relevance
“…In this way, our MixFormer unifies the two processes of feature extraction and information integration with an iterative MAM based backbone, leading to a more compact, neat and effective end-to-end tracker. After our CVPR 2022 conference version [16], there are some co-occurent works in ECCV 2022 such as OSTrack [82] and SimTrack [9], which also use Transformer as backbone to perform both feature extraction and information fusion. Our work starts earlier and obtains better performance than them due to our customized design of localization head, well-explored pre-trained strategies and backbone, and effective online templates selection method.…”
Section: Tracking Paradigmmentioning
confidence: 99%
“…In this way, our MixFormer unifies the two processes of feature extraction and information integration with an iterative MAM based backbone, leading to a more compact, neat and effective end-to-end tracker. After our CVPR 2022 conference version [16], there are some co-occurent works in ECCV 2022 such as OSTrack [82] and SimTrack [9], which also use Transformer as backbone to perform both feature extraction and information fusion. Our work starts earlier and obtains better performance than them due to our customized design of localization head, well-explored pre-trained strategies and backbone, and effective online templates selection method.…”
Section: Tracking Paradigmmentioning
confidence: 99%
“…In addition, active and vibrant research has been conducted on transformer-based tracking methods that adopt a lightweight backbone for aerial tracking 54,55 . Unlike the trackers mentioned above, the research on trackers in which the backbone is replaced with transformers instead of existing CNNs also shows remarkable performance 60,61 .…”
Section: Related Work Transformer In Visual Trackingmentioning
confidence: 99%
“…Because transformers were originally designed for sequence-to-sequence learning on textual data and have exhibited good performance, their ability to integrate global information has been gradually unveiled and transformers have been extended to other modern deep learning applications such as image classification (Liu et al, 2020 ; Chen C. -F. R. et al, 2021 ; He et al, 2021 ), reinforcement learning (Parisotto et al, 2020 ; Chen L. et al, 2021 ), face alignment (Ning et al, 2020 ), object detection (Beal et al, 2020 ; Carion et al, 2020 ), image recognition (Dosovitskiy et al, 2020 ) and object tracking (Yan et al, 2019 , 2021a ; Cao et al, 2021 ; Lin et al, 2021 ; Zhang J. et al, 2021 ; Chen B. et al, 2022 ; Chen et al, 2022b ; Mayer et al, 2022 ). Based on CNNs and transformers, the DERT (Carion et al, 2020 ) applies a transformer to object detection tasks.…”
Section: Related Workmentioning
confidence: 99%
“…Over the past few years, visual object tracking has made significant advancements based on the development of convolutional neural networks due to the breakthroughs that have been made to generate more powerful backbones, such as deeper networks (He et al, 2016 ; Chen B. et al, 2022 ), efficient network structure (Howard et al, 2017 ), attention mechanism (Hu et al, 2018 ). Inspired by the way of the human brain process the overload information (Wolfe and Horowitz, 2004 ), the attention mechanism is utilized to enhance the vital features and surpass the unnecessary information of the input feature.…”
Section: Introductionmentioning
confidence: 99%