In this work, a novel object-level building-matching method using cross-dimensional data, including 2D images and 3D point clouds, is proposed. The core of this method is a newly proposed plug-and-play Joint Descriptor Extraction Module (JDEM) that is used to extract descriptors containing buildings’ three-dimensional shape information from object-level remote sensing data of different dimensions for matching. The descriptor is named Signed Distance Descriptor (SDD). Due to differences in the inherent properties of different dimensional data, it is challenging to match buildings’ 2D images and 3D point clouds on the object level. In addition, features extracted from the same building in images taken at different angles are usually not exactly identical, which will also affect the accuracy of cross-dimensional matching. Therefore, the question of how to extract accurate, effective, and robust joint descriptors is key to cross-dimensional matching. Our JDEM maps different dimensions of data to the same 3D descriptor SDD space through the 3D geometric invariance of buildings. In addition, Multi-View Adaptive Loss (MAL), proposed in this paper, aims to improve the adaptability of the image encoder module to images with different angles and enhance the robustness of the joint descriptors. Moreover, a cross-dimensional object-level data set was created to verify the effectiveness of our method. The data set contains multi-angle optical images, point clouds, and the corresponding 3D models of more than 400 buildings. A large number of experimental results show that our object-level cross-dimensional matching method achieves state-of-the-art outcomes.