To optimize rice yield and enhance quality through targeted field management at each growth stage, rapid and accurate identification of rice growth stages is crucial. This study presents the Mobilenetv3-YOLOv8 rice growth-stage recognition model, designed for high efficiency and accuracy using Unmanned Aerial Vehicle (UAV) imagery. A UAV captured images of rice fields across five distinct growth stages from two altitudes (3 m and 20 m) across two independent field experiments. These images were processed to create training, validation, and test datasets for model development. Mobilenetv3 was introduced to replace the standard YOLOv8 backbone, providing robust small-scale feature extraction through multi-scale feature fusion. Additionally, the Coordinate Attention (CA) mechanism was integrated into YOLOv8’s backbone, outperforming the Convolutional Block Attention Module (CBAM) by enhancing position-sensitive information capture and focusing on crucial pixel areas. Compared to the original YOLOv8, the enhanced Mobilenetv3-YOLOv8 model improved rice growth-stage identification accuracy and reduced the computational load. With an input image size of 400 × 400 pixels and the CA implemented in the second and third backbone layers, the model achieved its best performance, reaching 84.00% mAP and 84.08% recall. The optimized model achieved parameters and Giga Floating Point Operations (GFLOPs) of 6.60M and 0.9, respectively, with precision values for tillering, jointing, booting, heading, and filling stages of 94.88%, 93.36%, 67.85%, 78.31%, and 85.46%, respectively. The experimental results revealed that the optimal Mobilenetv3-YOLOv8 shows excellent performance and has potential for deployment in edge computing devices and practical applications for in-field rice growth-stage recognition in the future.