As-is building models are becoming increasingly common in the Architecture, Engineering, and Construction industry, with many stakeholders requesting this information throughout the lifecycle of a building. Devices equipped with RGB cameras and depth sensors being readily available simplifies the task of capturing and reconstructing an environment (scene) as a spatial 3D mesh or point cloud. However, the task of converting this purely geometric information to a semantically meaningful as-is building model is non-trivial. State-of-the-art practice follows a first step of acquiring the spatial 3D mesh on site and subsequently resorts in manual or assisted semantic labeling in the office, where experts often have to work for many hours using non-intuitive and error-prone tools. To address this inefficiency, we develop HoloLabel, an Augmented Reality application on HoloLens that allows users to directly and on-site annotate a scene in 3D with rich semantic information while simultaneously capturing its spatial 3D mesh. Our tool follows a user-in-the-loop protocol to perform the task of 3D semantic segmentation, i.e., each face of the 3D mesh should be annotated with a semantic label. We leverage the HoloLens's Spatial Mapping feature and build a 3D mesh of the scene while the user is walking around; at intervals, we apply an automatic geometry-based segmentation algorithm to generate segmentation proposals. The user then assigns predefined semantic labels to the proposals and -if necessary -uses a virtual paintbrush to refine the proposed segments or create new ones. Finally, the user has the option to add rich semantic descriptions (e.g., material, shape, or relationship to another object) to segments using voice-to-text technology. We aim to lay the groundwork to leverage upcoming mixed reality devices for intuitive synchronous as-is semantic building model generation directly in the real world.