2021
DOI: 10.48550/arxiv.2111.13260
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes

Abstract: We present NeSF, a method for producing 3D semantic fields from posed RGB images alone. In place of classical 3D representations, our method builds on recent work in implicit neural scene representations wherein 3D structure is captured by point-wise functions. We leverage this methodology to recover 3D density fields upon which we then train a 3D semantic segmentation model supervised by posed 2D semantic maps. Despite being trained on 2D signals alone, our method is able to generate 3D-consistent semantic ma… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
14
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(14 citation statements)
references
References 99 publications
0
14
0
Order By: Relevance
“…We demonstrate two representative baselines for 2D image and 3D point cloud segmentation: DeepLab [15] and SparseConvNet [35], respectively. In addition, we compare these methods with NeSF [97], a method for dense 2D and 3D scene segmentation from posed RGB images. We train all methods with semantic supervision derived from 9 cameras per scene from 500 scenes and hold out 4 cameras per scene from the remaining 25 scenes for evaluation.…”
Section: Scene Semantic Segmentationmentioning
confidence: 99%
See 1 more Smart Citation
“…We demonstrate two representative baselines for 2D image and 3D point cloud segmentation: DeepLab [15] and SparseConvNet [35], respectively. In addition, we compare these methods with NeSF [97], a method for dense 2D and 3D scene segmentation from posed RGB images. We train all methods with semantic supervision derived from 9 cameras per scene from 500 scenes and hold out 4 cameras per scene from the remaining 25 scenes for evaluation.…”
Section: Scene Semantic Segmentationmentioning
confidence: 99%
“…NeSF, on the other hand, must infer 3D geometry and semantics from posed 2D images alone. Further results and comparison to NeSF are presented in [97].…”
Section: Scene Semantic Segmentationmentioning
confidence: 99%
“…Nevertheless, in 3D scene editing, similar capabilities are still limited due to the high demand for multi-view consistency. Existing approaches either rely on laborious annotation [28,73,75,78], only support object deformation or translation [32,65,67,78], or only perform global style transfer [12,13,16,21,79] without strong semantic meaning. Recently, 3D-aware GANs [8,9,18,25,48,60,63] and semantic NeRF editing [37,68] learn a latent space of the category and enable editing via latent code control.…”
Section: Related Workmentioning
confidence: 99%
“…Especially, neural radiance field (NeRF) and its variants (Mildenhall et al, 2020a; Barron et al, 2021) adopt multi-layer perceptrons (MLPs) to learn continuous representation and utilize calibrated multi-view images to render unseen views with fine-grained details. Besides rendering quality, the ability of scene understanding has been explored by several recent works (Vora et al, 2021;Yang et al, 2021;Zhi et al, 2021). Nevertheless, they either require dense view annotations to train a heavy 3D backbone for capturing semantic representations (Vora et al, 2021;Yang et al, 2021), or necessitate human intervention to provide sparse semantic labels (Zhi et al, 2021).…”
mentioning
confidence: 99%
“…Besides rendering quality, the ability of scene understanding has been explored by several recent works (Vora et al, 2021;Yang et al, 2021;Zhi et al, 2021). Nevertheless, they either require dense view annotations to train a heavy 3D backbone for capturing semantic representations (Vora et al, 2021;Yang et al, 2021), or necessitate human intervention to provide sparse semantic labels (Zhi et al, 2021). Recent self-supervised object discovery approaches on neural radiance fields (Yu et al, 2021c;Stelzner et al, 2021) try to decompose objects from givens scenes on the synthetic indoor data.…”
mentioning
confidence: 99%