Modern satellite and aerial imagery outcomes exhibit increasingly complex types of ground objects with continuous developments and changes in land resources. Single remote-sensing modality is not sufficient for the accurate and satisfactory extraction and classification of ground objects. Hyperspectral imaging has been widely used in the classification of ground objects because of its high resolution, multiple bands, and abundant spatial and spectral information. Moreover, the airborne light detection and ranging (LiDAR) point-cloud data contains unique high-precision three-dimensional (3D) spatial information, which can enrich ground object classifiers with height features that hyperspectral images do not have. Therefore, the fusion of hyperspectral image data with airborne LiDAR point-cloud data is an effective approach for ground object classification. In this paper, the effectiveness of such a fusion scheme is investigated and confirmed on an observation area in the middle parts of the Heihe River in China. By combining the characteristics of hyperspectral compact airborne spectrographic imager (CASI) data and airborne LiDAR data, we extracted a variety of features for data fusion and ground object classification. Firstly, we used the minimum noise fraction transform to reduce the dimensionality of hyperspectral CASI images. Then, spatio-spectral and textural features of these images were extracted based on the normalized vegetation index and the gray-level co-occurrence matrices. Further, canopy height features were extracted from airborne LiDAR data. Finally, a hierarchical fusion scheme was applied to the hyperspectral CASI and airborne LiDAR features, and the fused features were used to train a residual network for high-accuracy ground object classification. The experimental results showed that the overall classification accuracy was based on the proposed hierarchical-fusion multiscale dilated residual network (M-DRN), which reached an accuracy of 97.89%. This result was found to be 10.13% and 5.68% higher than those of the convolutional neural network (CNN) and the dilated residual network (DRN), respectively. Spatio-spectral and textural features of hyperspectral CASI images can complement the canopy height features of airborne LiDAR data. These complementary features can provide richer and more accurate information than individual features for ground object classification and can thus outperform features based on a single remote-sensing modality.