The object tracking algorithm based on Siamese network often extracts the deep feature of the target to be tracked from the first frame of the video sequence as a template, and uses the template for the whole tracking process. Because the manually annotated target in the first frame of video sequence is more accurate, these algorithms often have stable performance. However, it is difficult to adapt to the changing target features only using the target template extracted from the first frame. Inspired by the feature fusion network based on a transformer, this paper proposes a template update module called multi‐template temporary information fusion module (MTFM), which can be trained offline. By fusing multiple target template features on time series, the template can always adapt to the changes of target appearance in the tracking process. In order to train the MTFM, this paper proposes a training method using time series data and Mean Square Error (MSE) as the loss function. This paper uses the MTFM on SiamFC++ tracker, and obtains good experimental results in three challenging datasets, including VOT2016, OTB100 and GOT‐10k. The running speed of the algorithm on graphics processing unit (GPU) is maintained at about 200fps, which exhibits good real‐time performance.
In recent years, the object-tracking algorithm based on Siamese network has gradually become the mainstream algorithm in the field of object tracking due to its characteristics of balancing speed and accuracy. The majority of Siamese-based trackers only use the first frame extraction template for subsequent tracking in order to prevent the introduction of noise. However, merely with a single initial template employed, it is difficult to achieve the best performance of the tracker in the face of complex tracking environments such as occlusion, motion blur, and non-rigid deformation. Therefore, the present paper proposes a new multi-template fusion module based on graph attention network (G-M module), which consists of two parts: a graph-attentionnetwork-based feature-embedding module (G module) and a multi-template fusion module (M module). It can greatly reduce the background noise introduced by template updating while improving the tracker's ability to adapt to changes in object appearance. In addition, in order to maximize the value of G-M module, the present paper also puts forward a two-stage template update threshold judgment mechanism. The Pearson correlation coefficient (PCCs) is introduced and combined with APCE and the maximum response value (F-max) to filter out reliable templates for updating. In this paper, the proposed method is applied to the SiamFC and SiamFC++ trackers. Extensive experiments on mainstream data sets, such as OTB2015, VOT2016, and GOT-10 k, show that the proposed method can effectively update the tracking template and improve the tracker performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.