The recognition and localization of strawberries are crucial for automated harvesting and yield prediction. This article proposes a novel RTF-YOLO (RepVgg-Triplet-FocalLoss-YOLO) network model for real-time strawberry detection. First, an efficient convolution module based on structural reparameterization is proposed. This module was integrated into the backbone and neck networks to improve the detection speed. Then, the triplet attention mechanism was embedded into the last two detection heads to enhance the network’s feature extraction for strawberries and improve the detection accuracy. Lastly, the focal loss function was utilized to enhance the model’s recognition capability for challenging strawberry targets, which thereby improves the model’s recall rate. The experimental results demonstrated that the RTF-YOLO model achieved a detection speed of 145 FPS (frames per second), a precision of 91.92%, a recall rate of 81.43%, and an mAP (mean average precision) of 90.24% on the test dataset. Relative to the baseline of YOLOv5s, it showed improvements of 19%, 2.3%, 4.2%, and 3.6%, respectively. The RTF-YOLO model performed better than other mainstream models and addressed the problems of false positives and false negatives in strawberry detection caused by variations in illumination and occlusion. Furthermore, it significantly enhanced the speed of detection. The proposed model can offer technical assistance for strawberry yield estimation and automated harvesting.