Tomato leaf diseases pose a significant threat to plant growth and productivity, necessitating the accurate identification and timely management of these issues. Existing models for tomato leaf disease recognition can primarily be categorized into Convolutional Neural Networks (CNNs) and Visual Transformers (VTs). While CNNs excel in local feature extraction, they struggle with global feature recognition; conversely, VTs are advantageous for global feature extraction but are less effective at capturing local features. This discrepancy hampers the performance improvement of both model types in the task of tomato leaf disease identification. Currently, effective fusion models that combine CNNs and VTs are still relatively scarce. We developed an efficient CNNs and VTs fusion network named ECVNet for tomato leaf disease recognition. Specifically, we first designed a Channel Attention Residual module (CAR module) to focus on channel features and enhance the model’s sensitivity to the importance of feature channels. Next, we created a Convolutional Attention Fusion module (CAF module) to effectively extract and integrate both local and global features, thereby improving the model’s spatial feature extraction capabilities. We conducted extensive experiments using the Plant Village dataset and the AI Challenger 2018 dataset, with ECVNet achieving state-of-the-art recognition performance in both cases. Under the condition of 100 epochs, ECVNet achieved an accuracy of 98.88% on the Plant Village dataset and 86.04% on the AI Challenger 2018 dataset. The introduction of ECVNet provides an effective solution for the identification of plant leaf diseases.