The building heights of an urban area are useful for space analysis, urban planning, and city management. To this end, a novel method for building height calculation for an urban area is proposed based on street view images and a deep learning model, that is, mask region-based convolutional neural network (Mask R-CNN). First, a spider of street view maps was developed, and an optimization model for observation locations was designed based on a genetic algorithm, by which the street view images of all buildings can be obtained with the minimum number of downloads. Subsequently, a deep learning workflow was designed based on the Mask R-CNN to detect buildings from the panorama images.Finally, an accurate height calculation model considering repeated detection of buildings was developed by mapping between detected buildings and actual buildings. Case studies indicate that the mean error of height calculation is 0.78 m, which achieves high precision for calculating building heights in urban areas, while the average calculation time is 4.57 s per building, which indicates that the proposed method is efficient for the application in urban areas.