To improve the deficient tracking ability of fully-convolutional Siamese networks (SiamFC) in complex scenes, an object tracking framework with Siamese network and re-detection mechanism (Siam-RM) is proposed. The mechanism adopts the Siamese instance search tracker (SINT) as the re-detection network. When multiple peaks appear on the response map of SiamFC, a more accurate re-detection network can re-determine the location of the object. Meanwhile, for the sake of adapting to various changes in appearance of the object, this paper employs a generative model to construct the templates of SiamFC. Furthermore, a method of template updating with high confidence is also used to prevent the template from being contaminated. Objective evaluation on the popular online tracking benchmark (OTB) shows that the tracking accuracy and the success rate of the proposed framework can reach 79.8% and 63.8%, respectively. Compared to SiamFC, the results of several representative video sequences demonstrate that our framework has higher accuracy and robustness in scenes with fast motion, occlusion, background clutter, and illumination variations.
Existing Siamese network based trackers are easily disturbed by large deformation, occlusion and distractor objects in the background. By comparing these trackers, we observe that the monotonous positive pairs usually have limited challenging factors (Occlusion, Deformation, etc.), which may make the learned features less robust. In addition, the foreground information of the substantial training data is utilized directly without deeper exploration. Thus, the trackers cannot effectively discriminate the foreground from the semantic backgrounds. In this paper, we focus on modifying the Siamese tracker by enriching the positive pairs and taking further advantage of the foreground information. During the offline training phase, a simple sampling strategy is adopted to enrich the challenging factors in positive pairs, which can effectively enhance the robustness of the tracker. At the same time, we highlight the foreground information by padding the background, and the information is utilized to generate a novel padding loss, which guides the tracker to pay less attention to the distractors in the background. Moreover, an improved feature information fusion is adopted to update the template, so that the tracker can adapt to the drastic appearance changes. Comprehensive experiments on the OTB and the VOT benchmarks demonstrate that our proposed tracker can achieve outstanding performance in both accuracy and robustness. INDEX TERMS Visual tracking, Siamese network, foreground information, feature information fusion.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.