In the integrated processing of large components in the aerospace field, for the end servo industrial robot positioning accuracy needs, often using binocular vision positioning method. This method can accurately measure the end position but is limited by the restricted field of view and small depth of field. So, a monocular camera is needed for servo-guiding the processing end of the robot arm. Therefore, a monocular depth estimation method based on improved Yolov8 and CNN fusion for mobile machining with large field-of-view is proposed in this paper. Firstly, a dataset construction method based on virtual-real fusion is proposed to solve the problem that the depth information corresponding to the training set images is difficult to measure; secondly, the proposed Yolov8sim-CNN cascade neural network can realize the measurement of fast localization and depth prediction of the target machining workpiece and realize the servo-guidance of the robot arm end. The experimental results show that the proposed Yolov8sim-CNN network can ensure high detection accuracy, and the detection accuracy is substantially improved compared with the method of CNN-only, which indicates that the proposed method has better fitting ability and higher accuracy.