The object tracking algorithm based on Siamese network often extracts the deep feature of the target to be tracked from the first frame of the video sequence as a template, and uses the template for the whole tracking process. Because the manually annotated target in the first frame of video sequence is more accurate, these algorithms often have stable performance. However, it is difficult to adapt to the changing target features only using the target template extracted from the first frame. Inspired by the feature fusion network based on a transformer, this paper proposes a template update module called multi‐template temporary information fusion module (MTFM), which can be trained offline. By fusing multiple target template features on time series, the template can always adapt to the changes of target appearance in the tracking process. In order to train the MTFM, this paper proposes a training method using time series data and Mean Square Error (MSE) as the loss function. This paper uses the MTFM on SiamFC++ tracker, and obtains good experimental results in three challenging datasets, including VOT2016, OTB100 and GOT‐10k. The running speed of the algorithm on graphics processing unit (GPU) is maintained at about 200fps, which exhibits good real‐time performance.