A novel region based 3D semantic mapping method is proposed for urban scenes. The proposed Semantic Urban Maps (SUM) method labels the regions of segmented images into a set of geometric and semantic classes simultaneously by employing a Markov Random Field based classification framework. The pixels in the labeled images are back-projected into a set of 3D point-clouds using stereo disparity. The point-clouds are registered together by incorporating the motion estimation and a coherent semantic map representation is obtained. SUM is evaluated on five urban benchmark sequences and is demonstrated to be successful in retrieving both geometric as well as semantic labels. The comparison with relevant state-of-art method reveals that SUM is competitive and performs better than the competing method in average pixel-wise accuracy.