Diffuser: Multi-View 2D-to-3D Label Diffusion for Semantic Scene Segmentation

Mascaro, Ruben; Teixeira, Lucas; Chli, Margarita

doi:10.1109/icra48506.2021.9561801

Cited by 20 publications

(5 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other works propose alternatives to the probabilistic Bayesian update for fusing semantic labels from multi-view 2D images into a 3D map. Mascaro et al [34] build a sparse diffusion graph connecting 2D pixels to 3D points and 3D points to their K nearest neighbors to propagate labels from a 2D image segmentation to the 3D model. After graph construction, iterative multiplication of the label matrix with a probabilistic transition matrix yields the diffused semantic labels.…”

Section: Related Workmentioning

confidence: 99%

Real-time multi-modal semantic fusion on unmanned aerial vehicles with label propagation for cross-domain adaptation

Bultmann

Quenzel²,

Behnke³

2023

Robotics and Autonomous Systems

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

Real-time multi-modal semantic fusion on unmanned aerial vehicles with label propagation for cross-domain adaptation

Bultmann

Quenzel²,

Behnke³

2023

Robotics and Autonomous Systems

View full text Add to dashboard Cite

“…It is worthwhile to mention that our method is not limited to a specific neural field method and can be extended easily to faster [19,40,15] and better-quality NeRFs [2,24]. Semantic segmentation in 3D: Semantic segmentation in 3D has been studied using multi-view fusion-based representations [1,22,11,20,39,21,44,52] that require only 2D supervision when training, and a separate 3D mesh at testing time, unlike implicit methods like ours. Recently, there have been promising attempts to recover 3D semantic maps from 2D inputs using NeRFs.…”

Section: Related Workmentioning

confidence: 99%

LaTeRF: Label and Text Driven Object Radiance Fields

Mirzaei¹,

Kant²,

Kelly³

et al. 2022

Preprint

View full text Add to dashboard Cite

Obtaining 3D object representations is important for creating photo-realistic simulations and for collecting AR and VR assets. Neural fields have shown their effectiveness in learning a continuous volumetric representation of a scene from 2D images, but acquiring object representations from these models with weak supervision remains an open challenge. In this paper we introduce LaTeRF, a method for extracting an object of interest from a scene given 2D images of the entire scene, known camera poses, a natural language description of the object, and a set of point-labels of object and non-object points in the input images. To faithfully extract the object from the scene, LaTeRF extends the NeRF formulation with an additional 'objectness' probability at each 3D point. Additionally, we leverage the rich latent space of a pre-trained CLIP model combined with our differentiable object renderer, to inpaint the occluded parts of the object. We demonstrate high-fidelity object extraction on both synthetic and real-world datasets and justify our design choices through an extensive ablation study.

show abstract

“…Many methods use one data modality to supervise or inform another [1,3,37,38,42,48,65,68,69,76,93,98,104,130,148]. For 3D semantic segmentation, multiview fusion [2,55,73,84,86,89,133,133,145] is a popular family of methods that require only image supervision. However, these methods reason exclusively in the image domain and require an input 3D substrate such as a point cloud or polygonal mesh on which to aggregate 2D information.…”

Section: Related Workmentioning

confidence: 99%

NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes

Vora¹,

Radwan²,

Greff³

et al. 2021

Preprint

View full text Add to dashboard Cite

We present NeSF, a method for producing 3D semantic fields from posed RGB images alone. In place of classical 3D representations, our method builds on recent work in implicit neural scene representations wherein 3D structure is captured by point-wise functions. We leverage this methodology to recover 3D density fields upon which we then train a 3D semantic segmentation model supervised by posed 2D semantic maps. Despite being trained on 2D signals alone, our method is able to generate 3D-consistent semantic maps from novel camera poses and can be queried at arbitrary 3D points. Notably, NeSF is compatible with any method producing a density field, and its accuracy improves as the quality of the density field improves. Our empirical analysis demonstrates comparable quality to competitive 2D and 3D semantic segmentation baselines on complex, realisticallyrendered synthetic scenes. Our method is the first to offer truly dense 3D scene segmentations requiring only 2D supervision for training, and does not require any semantic input for inference on novel scenes. We encourage the readers to visit the project website.* Denotes equal contribution.

show abstract

Diffuser: Multi-View 2D-to-3D Label Diffusion for Semantic Scene Segmentation

Cited by 20 publications

References 28 publications

Real-time multi-modal semantic fusion on unmanned aerial vehicles with label propagation for cross-domain adaptation

Real-time multi-modal semantic fusion on unmanned aerial vehicles with label propagation for cross-domain adaptation

LaTeRF: Label and Text Driven Object Radiance Fields

NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes

Contact Info

Product

Resources

About