The accurate calculation of ore rock fragmentation is important for achieving the autonomous mining operation of mine excavators. However, a single mode cannot accurately calculate the ore rock fragmentation due to the low resolution of the point cloud mode and the lack of spatial position information of the image mode. To solve this problem, we propose an ore rock fragmentation calculation method (ORFCM) based on the multi-modal fusion of point clouds and images. The ORFCM makes full use of the advantages of multi-modal data, including the fine-grained object segmentation of images and spatial location information of point clouds. To solve the problem of image under-segmentation, we propose a multiscale adaptive edge-detection method based on an innovative standard deviation map to enhance the weak edges. Furthermore, an improved marked watershed segmentation algorithm is proposed to solve the problem of low segmentation accuracy caused by excessive noise of the gradient map and weak edges submerged. Experiments demonstrate that ORFCM can accurately calculate ore rock fragmentation in the local excavation area without relying on external markers for pixel calibration. The average error of the equivalent diameter of ore rock blocks is 0.66 cm, the average error of the elliptical long diameter is 1.42 cm, and the average error of the elliptical short diameter is 1.06 cm, which can effectively meet practical engineering needs.