Airborne array-interferometric synthetic aperture radar (array-InSAR), one of the implementation methods of tomographic SAR (TomoSAR), has the advantages of all-time, all-weather, high consistency, and exceptional timeliness. As urbanization continues to develop, the utilization of array-InSAR data for building detection holds significant application value. Existing methods, however, face challenges in terms of automation and detection accuracy, which can impact the subsequent accuracy and quality of building modeling. On the other hand, deep learning methods are still in their infancy in SAR point cloud processing. Existing deep learning methods do not adapt well to this problem. Therefore, we propose a Position-Feature Attention Network (PFA-Net), which seamlessly integrates positional encoding with point transformer for SAR point clouds building target segmentation tasks. Experimental results show that the proposed network is better suited to handle the inherent characteristics of SAR point clouds, including high noise levels and multiple scattering artifacts. And it achieves more accurate segmentation results while maintaining computational efficiency and avoiding errors associated with manual labeling. The experiments also investigate the role of multidimensional features in SAR point cloud data. This work also provides valuable insights and references for future research between SAR point clouds and deep learning.