Deep learning-based template matching in remote sensing has received increasing research attention. Existing anchor box-based and anchor-free methods often suffer from low template localization accuracy in the presence of multimodal, nonrigid deformation and occlusion. To address this problem, we transform the template matching task into a center-point localization task for the first time and propose an end-to-end template matching method based on a novel fully convolutional Siamese network. Furthermore, we propose an adaptive shrinkage cross-correlation scheme, which improves the precision of template localization and alleviates the impact of background clutter without adding any parameters. We also design a scheme that leverages keypoint information to assist in locating the template center, thereby enhancing the precision of template localization. We construct a multimodal template matching dataset to verify the performance of the method in dealing with differences in view, scale, rotation and occlusion in practical application scenarios. Extensive experiments on a public dataset, OTB, the proposed dataset, as well as a remote sensing dataset, SEN1-2, demonstrate that our method achieves state-of-the-art performance.