The rapid detection and accurate localization of wildfires are critical for effective disaster management and response. This study proposes an innovative Unmanned aerial vehicles (UAVs)-based fire detection system leveraging a modified Miti-DETR model tailored to meet the computational constraints of drones. The enhanced architecture incorporates a redesigned AlexNet backbone with residual depthwise separable convolution blocks, significantly reducing computational load while improving feature extraction and accuracy. Furthermore, a novel residual self-attention mechanism addresses convergence issues in transformer networks, ensuring robust feature representation for complex aerial imagery. The model, which was trained on the FLAME dataset encompassing diverse fire scenarios, demonstrates superior performance in terms of Mean Average Precision (mAP) and Intersection over Union (IoU) metrics compared to existing systems. Its capability to detect and localize fires across varied backgrounds highlights its practical application in real-world scenarios. This advancement represents a pivotal step forward in applying deep learning for real-time wildfire detection, with implications for broader emergency management applications.