Aiming at the problem of inaccurate extraction of feature points by the traditional image matching method, low robustness, and problems such as diffculty in inentifying feature points in area with poor texture. This paper proposes a new local image feature matching method, which replaces the traditional sequential image feature detection, description and matching steps. First, extract the coarse features with a resolution of 1/8 from the original image, then tile to a one-dimensional vector plus the positional encoding, feed them to the self-attention layer and cross-attention layer in the Transformer module, and finally get through the Differentiable Matching Layer and confidence matrix, after setting the threshold and the mutual closest standard, a Coarse-Level matching prediction is obtained. Secondly the fine matching is refined at the Fine-level match, after the Fine-level match is established, the image overlapped area is aligned by transforming the matrix to a unified coordinate, and finally the image is fused by the weighted fusion algorithm to realize the unification of seamless mosaic of images. This paper uses the self-attention layer and cross-attention layer in Transformers to obtain the feature descriptor of the image. Finally, experiments show that in terms of feature point extraction, LoFTR algorithm is more accurate than the traditional SIFT algorithm in both low-texture regions and regions with rich textures. At the same time, the image mosaic effect obtained by this method is more accurate than that of the traditional classic algorithms, the experimental effect is more ideal.