The mining industry faces significant challenges in production costs, environmental protection, and worker safety, necessitating the development of autonomous systems. This study presents the design and implementation of a robust rock centroid localization system for mining robotic applications, particularly rock-breaking hammers. The system comprises three phases: assembly, data acquisition, and data processing. Environmental sensing was accomplished using a Basler Blaze 101 three-dimensional (3D) Time-of-Flight (ToF) camera. The data processing phase incorporated advanced algorithms, including Bird’s-Eye View (BEV) image conversion and You Only Look Once (YOLO) v8x-Seg instance segmentation. The system’s performance was evaluated using a comprehensive dataset of 627 point clouds, including samples from real mining environments. The system achieved efficient processing times of approximately 5 s. Segmentation accuracy was evaluated using the Intersection over Union (IoU), reaching 95.10%. Localization precision was measured by the Euclidean distance in the XY plane (EDXY), achieving 0.0128 m. The normalized error (enorm) on the X and Y axes did not exceed 2.3%. Additionally, the system demonstrated high reliability with R2 values close to 1 for the X and Y axes, and maintained performance under various lighting conditions and in the presence of suspended particles. The Mean Absolute Error (MAE) in the Z axis was 0.0333 m, addressing challenges in depth estimation. A sensitivity analysis was conducted to assess the model’s robustness, revealing consistent performance across brightness and contrast variations, with an IoU ranging from 92.88% to 96.10%, while showing greater sensitivity to rotations.