Across the globe, people are working to build "smart cities" that will employ technology to make people's lives better and safer. Installing cameras at strategic spots across the city to monitor public spaces besides provide real-time footage to law enforcement besides other local authorities is a crucial part of smart city infrastructure, which includes video surveillance. A more effective answer is provided by deep learning algorithms, however research in this area still faces significant problems from changes in target size, form change, occlusion, and illumination circumstances as seen from the drone's perspective. In light of the aforementioned issues, this study presents a highly effective and resilient approach for aerial picture identification. To begin, the concept of Bi-PAN-FPN is presented to enhance the neck component of YOLOv8-s, taking into consideration the prevalent issue of small targets being easily misdetected or ignored in aerial photos. We achieve a more advanced and thorough feature fusion procedure much as feasible by completely considering and reusing multiscale features. To further reduce the amount of parameters in the model and prevent info loss during long-distance feature transfer, the benchmark model's backbone incorporates the GhostblockV2 structure in lieu of a portion of the C2f module. With the help of the Enhanced Dwarf Mongoose Optimization Algorithm (EDMOA), the suggested model's hyper-parameters are optimised. Lastly, a dynamic nonmonotonic focusing mechanism is employed in conjunction with WiseIoU loss as bounding box regression loss. The detector accounts for varying anchor box quality by utilizing "outlier" evaluations, thus improving the complete presentation of the detection task.