Unmanned aircraft vehicles (UAVs) capture oblique point clouds in outdoor scenes that contain considerable building information. Building features extracted from images are affected by the viewing point, illumination, occlusion, noise and image conditions, which make building features difficult to extract. Currently, ground elevation changes can provide powerful aids for the extraction, and point cloud data can precisely reflect this information. Thus, oblique photogrammetry point clouds have significant research implications. Traditional building extraction methods involve the filtering and sorting of raw data to separate buildings, which cause the point clouds to lose spatial information and reduce the building extraction accuracy. Therefore, we develop an intelligent building extraction method based on deep learning that incorporates an attention mechanism module into the Samling and PointNet operations within the set abstraction layer of the PointNet++ network. To assess the efficacy of our approach, we train and extract buildings from a dataset created using UAV oblique point clouds from five regions in the city of Bengbu, China. Impressive performance metrics are achieved, including 95.7% intersection over union, 96.5% accuracy, 96.5% precision, 98.7% recall and 97.8% F1 score. And with the addition of attention mechanism, the overall training accuracy of the model is improved by about 3%. This method showcases potential for advancing the accuracy and efficiency of digital urbanization construction projects.