In this paper, we propose a fast and accurate deep network-based object tracking method, which combines feature representation, template tracking and foreground detection into a single framework for robust tracking. The proposed framework consists of a backbone network, which feeds into two parallel networks, TmpNet for template tracking and FgNet for foreground detection. The backbone network is a pre-trained modified VGG network, in which a few parameters need to be fine-tuned for adapting to the tracked object. FgNet is a fully convolutional network to distinguish the foreground from background in a pixel-to-pixel manner. The parameter in TmpNet is the learned channel-wise target template, which initializes in the first frame and performs fast template tracking in the test frames. To enable each component to work closely with each other, we use a multi-task loss to end-to-end train the proposed framework. In online tracking, we combine the score maps from TmpNet and FgNet to find the optimal tracking results. Experimental results on object tracking benchmarks demonstrate that our approach achieves favorable tracking accuracy against the state-of-the-art trackers while running at a real-time speed of 38 fps.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.