Fig. 1: We demonstrate object-oriented semantic mapping using RGB-D data that scales from small desktop environments (left) to offices (middle) and whole labs (right). The pictures show 3D map structures with objects colored according to their semantic class. We do not merely project semantic labels for individual 3D points, but rather maintain objects as the central entity of the map, freeing it from the requirement for a-priori 3D object models in [1]. To achieve this, our system creates and extends 3D object models while continuously mapping the environment. Object detection and classification is performed using a Convolutional Network, while an unsupervised 3D segmentation algorithm assigns a segment of 3D points to every object detection. These segmented object detections are then either fused with existing objects, or added as a new object to the map. ORB-SLAM2 provides a global SLAM solution that enables us to reconstruct a 3D model of the environment that contains both non-object structure and objects of various types.Abstract-For intelligent robots to interact in meaningful ways with their environment, they must understand both the geometric and semantic properties of the scene surrounding them. The majority of research to date has addressed these mapping challenges separately, focusing on either geometric or semantic mapping. In this paper we address the problem of building environmental maps that include both semantically meaningful, object-level entities and point-or mesh-based geometrical representations. We simultaneously build geometric point cloud models of previously unseen instances of known object classes and create a map that contains these object models as central entities. Our system leverages sparse, feature-based RGB-D SLAM, image-based deep-learning object detection and 3D unsupervised segmentation.