Synthetic aperture radar (SAR) tomography (TomoSAR) is a multibaseline interferometric technique that estimates the power spectrum pattern (PSP) along the perpendicular to the line-of-sight (PLOS) direction. TomoSAR achieves the separation of individual scatterers in layover areas, allowing for the 3D representation of urban zones. These scenes are typically characterized by buildings of different heights, with layover between the facades of the higher structures, the rooftop of the smaller edifices and the ground surface. Multilooking, as required by most spectral estimation techniques, reduces the azimuth-range spatial resolution, since it is accomplished through the averaging of adjacent values, e.g., via Boxcar filtering. Consequently, with the aim of avoiding the spatial mixture of sources due to multilooking, this article proposes a novel methodology to perform single-look TomoSAR over urban areas. First, a robust version of Capon is applied to focus the TomoSAR data, being robust against the rank-deficiencies of the data covariance matrices. Afterward, the recovered PSP is refined using statistical regularization, attaining resolution enhancement, suppression of artifacts and reduction of the ambiguity levels. The capabilities of the proposed methodology are demonstrated by means of strip-map airborne data of the Jet Propulsion Laboratory (JPL) and the National Aeronautics and Space Administration (NASA), acquired by the uninhabited aerial vehicle SAR (UAVSAR) system over the urban area of Munich, Germany in 2015. Making use of multipolarization data [horizontal/horizontal (HH), horizontal/vertical (HV) and vertical/vertical (VV)], a comparative analysis against popular focusing techniques for urban monitoring (i.e., matched filtering, Capon and compressive sensing (CS)) is addressed.