“…Many methods use one data modality to supervise or inform another [1,3,37,38,42,48,65,68,69,76,93,98,104,130,148]. For 3D semantic segmentation, multiview fusion [2,55,73,84,86,89,133,133,145] is a popular family of methods that require only image supervision. However, these methods reason exclusively in the image domain and require an input 3D substrate such as a point cloud or polygonal mesh on which to aggregate 2D information.…”