In the field of digital cultural heritage (DCH), 2D/3D digitization strategies are becoming more and more complex. The emerging trend of multimodal imaging (i.e., data acquisition campaigns aiming to put in cooperation multi-sensor, multi-scale, multi-band and/or multi-epochs concurrently) implies several challenges in term of data provenance, data fusion and data analysis. Making the assumption that the current usability of multi-source 3D models could be more meaningful than millions of aggregated points, this work explores a “reduce to understand” approach to increase the interpretative value of multimodal point clouds. Starting from several years of accumulated digitizations on a single use-case, we define a method based on density estimation to compute a Multimodal Enhancement Fusion Index (MEFI) revealing the intricate modality layers behind the 3D coordinates. Seamlessly stored into point cloud attributes, MEFI is able to be expressed as a heat-map if the underlying data are rather isolated and sparse or redundant and dense. Beyond the colour-coded quantitative features, a semantic layer is added to provide qualitative information from the data sources. Based on a versatile descriptive metadata schema (MEMoS), the 3D model resulting from the data fusion could therefore be semantically enriched by incorporating all the information concerning its digitization history. A customized 3D viewer is presented to explore this enhanced multimodal representation as a starting point for further 3D-based investigations.