Summary
Siamese‐based trackers have made great progress in visual tracking community, however, the shared structure of network between classification and regression tasks limits the ability of the trackers to obtain more robust classification prediction and more accurate regression prediction. In this paper, we propose an effective visual tracking framework (named Siamese Disentangled Tracking‐Head, SiamDTH), which disentangles classification and regression in Siamese‐based network for visual tracking from two aspects: feature decoupling and differentiated tracking‐head. First of all, we gather the features of receptive fields with different scales and ratios, and decouple the correlation features through two different styles of feature fusion mode for classification and regression respectively. Moreover, we design the differentiated tracking‐head structure in the sibling head for discriminately handling the parallel classification and regression tasks on visual tracking. Extensive experiments on visual tracking benchmarks including VOT2018, VOT2019 and OTB100 demonstrate that our proposed SiamDTH achieves state‐of‐the‐art performance with a considerable real‐time speed. Our source code is available at:
https://github.com/xl0312/SiamDTH.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.