Traditional and convolutional neural network (CNN)-based geographic object-based image analysis (GeOBIA) land-cover classification methods prosper in remote sensing and generate numerous distinguished achievements. However, a bottleneck emerges and hinders further improvements in classification results, due to the insufficiency of information provided by very high-spatial resolution images (VHSRIs). To be specific, the phenomenon of different objects with similar spectrum and the lack of topographic information (heights) are natural drawbacks of VHSRIs. Thus, multisource data steps into people’s sight and shows a promising future. Firstly, for data fusion, this paper proposed a standard normalized digital surface model (StdnDSM) method which was actually a digital elevation model derived from a digital terrain model (DTM) and digital surface model (DSM) to break through the bottleneck by fusing VHSRI and cloud points. It smoothed and improved the fusion of point cloud and VHSRIs and thus performed well in follow-up classification. The fusion data then were utilized to perform multiresolution segmentation (MRS) and worked as training data for the CNN. Moreover, the grey-level co-occurrence matrix (GLCM) was introduced for a stratified MRS. Secondly, for data processing, the stratified MRS was more efficient than unstratified MRS, and its outcome result was theoretically more rational and explainable than traditional global segmentation. Eventually, classes of segmented polygons were determined by majority voting. Compared to pixel-based and traditional object-based classification methods, majority voting strategy has stronger robustness and avoids misclassifications caused by minor misclassified centre points. Experimental analysis results suggested that the proposed method was promising for object-based classification.