SAR tomography (TomoSAR) is an important technology for three-dimensional (3D) reconstruction of buildings through multiple coherent SAR images. In order to obtain sufficient signal-to-noise ratio (SNR), typical TomoSAR applications often require dozens of scenes of SAR images. However, limited by time and cost, the available SAR images are often only 3–5 scenes in practice, which makes the traditional TomoSAR technique unable to produce satisfactory SNR and elevation resolution. To tackle this problem, the conditional generative adversarial network (CGAN) is proposed to improve the TomoSAR 3D reconstruction by learning the prior information of building. Moreover, the number of tracks required can be reduced to three. Firstly, a TomoSAR 3D super-resolution dataset is constructed using high-quality data from the airborne array and low-quality data obtained from a small amount of tracks sampled from all observations. Then, the CGAN model is trained to estimate the corresponding high-quality result from the low-quality input. Airborne data experiments prove that the reconstruction results are improved in areas with and without overlap, both qualitatively and quantitatively. Furthermore, the network pretrained on the airborne dataset is directly used to process the spaceborne dataset without any tuning, and generates satisfactory results, proving the effectiveness and robustness of our method. The comparative experiment with nonlocal algorithm also shows that the proposed method has better height estimation and higher time efficiency.