This paper proposes an effective scheme for detecting the number of buildings in a scene from a single high-resolution Synthetic Aperture Radar (SAR) image. The layover and double bounce echoes in SAR images are detected first as building elements, which are then split, merged, or discarded to make each patch correspond to one building. A model describing the statistical relationship between the number of buildings and the features of the detected building elements is constructed. Based on this model, large building patches are split into the proper number of small patches. This scheme is tested on 3-m and 6-m resolution TerraSAR-X images that cover two sites in different provinces of China, and its advantages and limitations are discussed.