Fully-convolutional Siamese networks for visual tracking have drawn great attention in balancing tracking accuracy and speed. However, there is still some inherent inaccuracy with advanced trackers, since they only learn a general matching model from large scale datasets by off-line training. This generates the target template without sufficient discriminant information and does not adapt well to the current tracking sequence. In this paper, we introduce the channel attention mechanism into the network to better learn the matching model and, during the online tracking phase, we design an initial matting guidance strategy in which: 1) the superpixel matting algorithm is applied to extract the target foreground in the initial frame, and 2) the matted image with foreground only is fed into the network and fused with the original image feature. Under matting guidance, the fused target template has more details for representation of target appearance and more structural information from superpixels for robust tracking. The experimental results on object tracking benchmark (OTB) show that our approach achieves excellent performance while it also provides real-time tracking speed.
INDEX TERMSVisual tracking, siamese network, matching model, superpixel, channel attention.