With the convenient availability of remote sensing data, how to make models to interpret complex remote sensing data attracts wide attention. In remote sensing data, hyperspectral images contain spectral information and LiDAR contains elevation information. Hence, more explorations are warranted to better fuse the features of different source data. In this paper, we introduce semantic understanding to dynamically fuse data from two different sources, extract features of HSI and LiDAR through different capsule network branches and improve self-supervised loss and random rigid rotation in canonical capsule to a high-dimensional situation. Canonical capsule computes the capsule decomposition of objects by permutation-equivariant attention and the process is self-supervised by training pairs of randomly rotated objects. After fusing the features of HSI and LiDAR with semantic understanding, the unsupervised extraction of spectral-spatial-elevation fusion features is achieved. With two real-world examples of HSI and LiDAR fused, the experimental results show that the proposed multi-branch high-dimensional canonical capsule algorithm can be effective for semantic understanding of HSI and LiDAR. It indicates that the model can extract HSI and LiDAR data features effectively as opposed to existing models for unsupervised extraction of multi-source RS data.