The growing need for effective object detection models on mobile devices makes it essential to design models that are both accurate and have fewer parameters. In this paper, we introduce a YOLOv8 Res2Net Extended Network (YOLOv8-CGRNet) approach that achieves enhanced precision under standards suitable for lightweight mobile devices. Firstly, we merge YOLOv8 with the Context GuidedNet (CGNet) and Residual Network with multiple branches (Res2Net) structures, augmenting the model’s ability to learn deep Res2Net features without adding to its complexity or computational demands. CGNet effectively captures local features and contextual surroundings, utilizing spatial dependencies and context information to improve accuracy. By reducing the number of parameters and saving on memory usage, it adheres to a ‘deep yet slim’ principle, lessening channel numbers between stages. Secondly, we explore an improved pyramid network (FPN) combination and employ the Stage Partial Spatial Pyramid Pooling Fast (SimPPFCSPC) structure to further strengthen the network’s capability in processing the FPN. Using a dynamic non-monotonic focusing mechanism (FM) gradient gain distribution strategy based on Wise-IoU (WIoU) in an anchor-free context, this method effectively manages low-quality examples. It enhances the overall performance of the detector. Thirdly, we introduce Unifying Object Detection Heads with Attention, adapting to various input scenarios and increasing the model’s flexibility. Experimental datasets include the commonly used detection datasets: VOC2007, VOC2012, and VisDrone. The experimental results demonstrate a 4.3% improvement in detection performance by the proposed framework, affirming superior performance over the original YOLOv8 model in terms of accuracy and robustness and providing insights for future practical applications.