Buildings are significant components of digital cities, and their precise extraction is essential for the three-dimensional modeling of cities. However, it is difficult to accurately extract building features effectively in complex scenes, especially where trees and buildings are tightly adhered. This paper proposes a highly accurate building point cloud extraction method based solely on the geometric information of points in two stages. The coarsely extracted building point cloud in the first stage is iteratively refined with the help of mask polygons and the region growing algorithm in the second stage. To enhance accuracy, this paper combines the Alpha Shape algorithm with the neighborhood expansion method to generate mask polygons, which help fill in missing boundary points caused by the region growing algorithm. In addition, this paper performs mask extraction on the original points rather than non-ground points to solve the problem of incorrect identification of facade points near the ground using the cloth simulation filtering algorithm. The proposed method has shown excellent extraction accuracy on the Urban-LiDAR and Vaihingen datasets. Specifically, the proposed method outperforms the PointNet network by 20.73% in precision for roof extraction of the Vaihingen dataset and achieves comparable performance with the state-of-the-art HDL-JME-GGO network. Additionally, the proposed method demonstrated high accuracy in extracting building points, even in scenes where buildings were closely adjacent to trees.