Building extraction has attracted much attentions for decades as a prerequisite for many applications and is still a challenging topic in the field of photogrammetry and remote sensing. Due to the lack of spectral information, massive data processing, and approach universality, building extraction from point clouds is still a thorny and challenging problem. In this paper, a novel deep-learning-based framework is proposed for building extraction from point cloud data. Specifically, first, a sample generation method is proposed to split the raw preprocessed multi-spectral light detection and ranging (LiDAR) data into numerous samples, which are directly fed into convolutional neural networks and completely cover the original inputs. Then, a graph geometric moments (GGM) convolution is proposed to encode the local geometric structure of point sets. In addition, a hierarchical architecture equipped with GGM convolution, called GGM convolutional neural networks, is proposed to train and recognize building points. Finally, the test scenes with varying sizes can be fed into the framework and obtain a point-wise extraction result. We evaluate the proposed framework and methods on the airborne multi-spectral LiDAR point clouds collected by an Optech Titan system. Compared with previous state-of-the-art networks, which are designed for point cloud segmentation, our method achieves the best performance with a correctness of 95.1%, a completeness of 93.7%, an F-measure of 94.4%, and an intersection over union (IoU) of 89.5% on two test areas. The experimental results confirm the effectiveness and efficiency of the proposed framework and methods.