Tomatoes possess significant nutritional and economic value. However, frequent diseases can detrimentally impact their quality and yield. Images of tomato diseases captured amidst intricate backgrounds are susceptible to environmental disturbances, presenting challenges in achieving precise detection and identification outcomes. This study focuses on tomato disease images within intricate settings, particularly emphasizing four prevalent diseases (late blight, gray leaf spot, brown rot, and leaf mold), alongside healthy tomatoes. It addresses challenges such as excessive interference, imprecise lesion localization for small targets, and heightened false-positive and false-negative rates in real-world tomato cultivation settings. To address these challenges, we introduce a novel method for tomato disease detection named TomatoDet. Initially, we devise a feature extraction module integrating Swin-DDETR’s self-attention mechanism to craft a backbone feature extraction network, enhancing the model’s capacity to capture details regarding small target diseases through self-attention. Subsequently, we incorporate the dynamic activation function Meta-ACON within the backbone network to further amplify the network’s ability to depict disease-related features. Finally, we propose an enhanced bidirectional weighted feature pyramid network (IBiFPN) for merging multi-scale features and feeding the feature maps extracted by the backbone network into the multi-scale feature fusion module. This enhancement elevates detection accuracy and effectively mitigates false positives and false negatives arising from overlapping and occluded disease targets within intricate backgrounds. Our approach demonstrates remarkable efficacy, achieving a mean Average Precision (mAP) of 92.3% on a curated dataset, marking an 8.7% point improvement over the baseline method. Additionally, it attains a detection speed of 46.6 frames per second (FPS), adeptly meeting the demands of agricultural scenarios.