Wildfires pose a severe threat to ecological systems, human life, and infrastructure, making early detection critical for timely intervention. Traditional fire detection systems rely heavily on single-sensor approaches and are often hindered by environmental conditions such as smoke, fog, or nighttime scenarios. This paper proposes Adaptive Multi-Sensor Oriented Object Detection with Space–Frequency Selective Convolution (AMSO-SFS), a novel deep learning-based model optimized for drone-based wildfire and smoke detection. AMSO-SFS combines optical, infrared, and Synthetic Aperture Radar (SAR) data to detect fire and smoke under varied visibility conditions. The model introduces a Space–Frequency Selective Convolution (SFS-Conv) module to enhance the discriminative capacity of features in both spatial and frequency domains. Furthermore, AMSO-SFS utilizes weakly supervised learning and adaptive scale and angle detection to identify fire and smoke regions with minimal labeled data. Extensive experiments show that the proposed model outperforms current state-of-the-art (SoTA) models, achieving robust detection performance while maintaining computational efficiency, making it suitable for real-time drone deployment.