In recent years, advancements in deep learning have fostered the development of sophisticated object detectors, specifically in the realm of computer vision. The inherent complexity of images captured by unmanned aerial vehicles (UAVs) presents a multitude of challenges for object detection. These include, but are not limited to, the detection of small and densely clustered objects, scale variance, occluded objects, and intricate backgrounds, which are particularly prevalent in drone-captured imagery when compared to natural scenes with larger and more distinct objects. The current landscape of object detection research has seen a surge in interest surrounding advanced, anchor-free object detectors, attention mechanisms, and the use of transformers as an alternative to convolutional neural networks. In light of these developments, this study introduces a novel object detection framework that eschews anchor utilization and leverages a transformer backbone for feature extraction. A cardinal grouping-based split attention module is integrated into this network to selectively extract the most pertinent features. The object detection head, termed the Pyramid Vision Split Attention Module Network (PvSAMNet), comprises three branches: classification, confidence, and regression, which collaboratively facilitate the final object detection from drone images. Additionally, an Intersection over Union (IoU) balanced loss function is employed to effectively equilibrate the classification and localization steps. The performance of the proposed detector is evaluated using the Visdrone-DET dataset, with the efficacy gauged by the average precision (AP) and average recall (AR) metrics. The results demonstrate that the proposed model outperforms other detector models with an average precision of 38.74. This study contributes to the ongoing discourse in the field of object detection, providing a novel framework that addresses the unique complexities of UAV imagery and demonstrates promising results in comparative evaluations.