Tire tread, as a primary feature of tires, presents a formidable challenge in reliable detection due to the vast variety of treads available and the scarcity of publicly available datasets. To address this issue, this paper introduces DCT-ResNet, a neural network designed for adaptive fusion of spatial and frequency domain features. This method overcomes dataset limitations through the use of generative networks and image enhancement techniques. Initially, the network captures spatial features using down-sampling layers and hidden layers. Concurrently, frequency domain features are extracted through the integration of Discrete Cosine Transform (DCT) and a specialized frequency domain network. The final step involves the use of a multi-head self-attention layer to achieve adaptive feature fusion, ensuring the reliable extraction of tire tread features. Experimental results highlight the effectiveness of the proposed approach. The DCT-ResNet network achieves impressive classification accuracies of 99% on the tire tread dataset and 97% on the CIFAR-10 dataset. Additionally, the network demonstrates a level of pattern similarity detection comparable to expert judgments. In adversarial testing, the data augmentation significantly enhances the network’s robustness, allowing DCT-ResNet to outperform other methods in resistance to interference. Consequently, the method presented in this paper holds substantial practical significance for the high-reliability detection of tire treads.