This research presents a groundbreaking approach in aerial image analysis by integrating the Real-Time Detection and Recognition (RT-DETR-X) model with the Slicing Aided Hyper Inference (SAHI) methodology, utilizing the VisDrone-DET dataset. Aimed at enhancing the efficiency of drone technology across a spectrum of applications, including water conservancy, geological exploration, and military operations, this study focuses on harnessing the real-time, end-to-end object detection capabilities of RT-DETR-X. Characterized by its high-speed and high-accuracy performance, particularly in UAV aerial photography, RT-DETR-X demonstrates a remarkable 54.8% Average Precision (AP) and 74 frames per second (FPS), surpassing similar models in both speed and accuracy. The research thoroughly examines the VisDrone-DET dataset, which encompasses a diverse range of small targets in UAV aerial photography scenes. Covering 10 distinct categories, the dataset provides a robust platform for rigorous model testing. The study emphasizes the utilization of the original image dataset for comprehensive training and evaluation, alongside the practical implementation of the SAHI method for enhanced detection of small-scale objects. Through an in-depth exploration of the model's performance in various scenarios and a detailed analysis of the environmental setup, this paper underscores the impact of integrating RT-DETR with the SAHI approach. The findings reveal significant progress in drone detection technologies, offering a holistic framework for effective and efficient aerial surveillance. The integration not only boosts the model's detection accuracy but also opens new avenues for advanced image analysis in UAV applications.