Semantic 3D scene understanding is a fundamental problem in computer vision and robotics. Despite recent advances in deep learning, its application to multi-domain 3D semantic segmentation typically suffers from the lack of extensive enough annotated 3D datasets. On the contrary, 2D neural networks benefit from existing large amounts of training data and can be applied to a wider variety of environments, sometimes even without need for retraining. In this paper, we present 'Diffuser', a novel and efficient multi-view fusion framework that leverages 2D semantic segmentation of multiple image views of a scene to produce a consistent and refined 3D segmentation. We formulate the 3D segmentation task as a transductive label diffusion problem on a graph, where multi-view and 3D geometric properties are used to propagate semantic labels from the 2D image space to the 3D map. Experiments conducted on indoor and outdoor challenging datasets demonstrate the versatility of our approach, as well as its effectiveness for both global 3D scene labeling and single RGB-D frame segmentation. Furthermore, we show a significant increase in 3D segmentation accuracy compared to probabilistic fusion methods employed in several state-of-theart multi-view approaches, with little computational overhead.