The fusion of the hyperspectral image (HSI) and the light detecting and ranging (LiDAR) data has a wide range of applications. This paper proposes a novel feature fusion method for urban area classification, namely the relative total variation structure analysis (RTVSA), to combine various features derived from HSI and LiDAR data. In the feature extraction stage, a variety of high-performance methods including the extended multi-attribute profile, Gabor filter, and local binary pattern are used to extract the features of the input data. The relative total variation is then applied to remove useless texture information of the processed data. Finally, nonparametric weighted feature extraction is adopted to reduce the dimensions. Random forest and convolutional neural networks are utilized to evaluate the fusion images. Experiments conducted on two urban Houston University datasets (including Houston 2012 and the training portion of Houston 2017) demonstrate that the proposed method can extract the structural correlation from heterogeneous data, withstand a noise well, and improve the land cover classification accuracy.