In order to guide the orchard management robot to realize autonomous steering in the row ends of a complex orchard environment, this paper proposes setting up steering markers in the form of fruit trees at the end of the orchard rows and realizing the rapid detection of the steering markers of the orchard management robot through the fast and accurate recognition and classification of different steering markers. First, a high-precision YOLOv7 model is used, and the depthwise separable convolution (DSC) is used instead of the 3 × 3 ordinary convolution, which improves the speed of model detection; at the same time, in order to avoid a decline in detection accuracy, the Convolutional Block Attention Module (CBAM) is added to the model, and the Focal loss function is introduced to improve the model’s attention to the imbalanced samples. Second, a binocular camera is used to quickly detect the steering markers, obtain the position information of the robot to the steering markers, and determine the starting point position of the robot’s autonomous steering based on the position information. Our experiments show that the average detection accuracy of the improved YOLOv7 model reaches 96.85%, the detection time of a single image reaches 15.47 ms, and the mean value of the localization error is 0.046 m. Comparing with the YOLOv4, YOLOv4-tiny, YOLOv5-s, and YOLOv7 models, the improved YOLOv7 model outperforms the other models in terms of combined detection time and detection accuracy. Therefore, the model proposed in this paper can quickly and accurately perform steering marker detection and steering start point localization, avoiding problems such as steering errors and untimely steering, shortening the working time and improving the working efficiency. This model also provides a reference and technical support for research on robot autonomous steering in other scenarios.