In tasks such as wood defect repair and the production of high-end wooden furniture, ensuring the consistency of the texture in repaired or jointed areas is crucial. This paper proposes the WTSM-SiameseNet model for wood-texture-similarity matching and introduces several improvements to address the issues present in traditional methods. First, to address the issue that fixed receptive fields cannot adapt to textures of different sizes, a multi-receptive field fusion feature extraction network was designed. This allows the model to autonomously select the optimal receptive field, enhancing its flexibility and accuracy when handling wood textures at different scales. Secondly, the interdependencies between layers in traditional serial attention mechanisms limit performance. To address this, a concurrent attention mechanism was designed, which reduces interlayer interference by using a dual-stream parallel structure that enhances the ability to capture features. Furthermore, to overcome the issues of existing feature fusion methods that disrupt spatial structure and lack interpretability, this study proposes a feature fusion method based on feature correlation. This approach not only preserves the spatial structure of texture features but also improves the interpretability and stability of the fused features and the model. Finally, by introducing depthwise separable convolutions, the issue of a large number of model parameters is addressed, significantly improving training efficiency while maintaining model performance. Experiments were conducted using a wood texture similarity dataset consisting of 7588 image pairs. The results show that WTSM-SiameseNet achieved an accuracy of 96.67% on the test set, representing a 12.91% improvement in accuracy and a 14.21% improvement in precision compared to the pre-improved SiameseNet. Compared to CS-SiameseNet, accuracy increased by 2.86%, and precision improved by 6.58%.