Tea bud target detection is essential for mechanized selective harvesting. To address the challenges of low detection precision caused by the complex backgrounds of tea leaves, this paper introduces a novel model called Tea-YOLOv8s. First, multiple data augmentation techniques are employed to increase the amount of information in the images and improve their quality. Then, the Tea-YOLOv8s model combines deformable convolutions, attention mechanisms, and improved spatial pyramid pooling, thereby enhancing the model’s ability to learn complex object invariance, reducing interference from irrelevant factors, and enabling multi-feature fusion, resulting in improved detection precision. Finally, the improved YOLOv8 model is compared with other models to validate the effectiveness of the proposed improvements. The research results demonstrate that the Tea-YOLOv8s model achieves a mean average precision of 88.27% and an inference time of 37.1 ms, with an increase in the parameters and calculation amount by 15.4 M and 17.5 G, respectively. In conclusion, although the proposed approach increases the model’s parameters and calculation amount, it significantly improves various aspects compared to mainstream YOLO detection models and has the potential to be applied to tea buds picked by mechanization equipment.