Objectives: Computer vision duties like object detection, tracking, and counting are significant for surveillance. Factors like altitude, camera angle, occlusion, and motion blur make it a more challenging task. To present a method to overcome all these factors and implement surveillance quickly and accurately for smaller and larger object aspect ratios. Methods: Horizontal Bounding Boxes and Oriented Bounding Boxes (HBB and OBB) are evaluated on two ground truths respectively. PASCAL VOC 07 metric is adopted to calculate the mean average precision. Constructed on the score, the original implementation of Mask R-CNN includes the application of a mask head to the highest-scoring 100 HBBs. Subsequently, the mask head was extended to all HBBs remaining after the process of Non-Maximum Suppression. This modification allowed the evaluation of Mask R-CNN, Cascade Mask R-CNN, and Hybrid Task Cascade methods on a wider range of bounding boxes. Findings: In summary, this research explores and compares different approaches and techniques in the field of object detection, particularly focusing on oriented object detection and the challenges posed by geometric variations. Furthermore, it addresses the impact of different models, such as Mask R-CNN, Faster R-CNN OBB + RoI Transformer, and Faster R-CNN OBB + Dpool, on performance. Additionally, it highlights the importance of handling numerical instability caused by extremely small instances. The research findings are visually presented in Figure 2, providing a clear representation of the performance of various networks. Novelty: The study summarizes the findings of existing research papers and identifies research gaps. The performance parameters of the various algorithms and analysis for various networks show the evolution of various methods over the years. With changes in the network, like mask transferring and dataset, the accuracy for smaller, bigger objects and speed of execution are affected, are explained in results and discussions as well as the conclusions.