2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
DOI: 10.1109/cvpr.2017.592
|View full text |Cite
|
Sign up to set email alerts
|

Semantically Coherent Co-Segmentation and Reconstruction of Dynamic Scenes

Abstract: In this paper we propose a framework for spatially and temporally coherent semantic co-segmentation and reconstruction of complex dynamic scenes from multiple static or moving cameras. Semantic co-segmentation exploits the coherence in semantic class labels both spatially, between views at a single time instant, and temporally, between widely spaced time instants of dynamic objects with similar shape and appearance. We demonstrate that semantic coherence results in improved segmentation and reconstruction for … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
44
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 47 publications
(45 citation statements)
references
References 58 publications
0
44
0
Order By: Relevance
“…Previous studies on wide baseline human performance capture methods [24,33] either use initial sparse reconstruction or visual hulls generated from multi cameras to limit the stereo search space. Other methods for wide baseline semantic reconstruction exploit semantic segmentation constraints to improve the multi-view stereo [23]. In this study, we propose to exploit semantic masking in the stereo matching framework to limit the search region along the Epipolar line to decrease the number of wrong matches from only two camera views.…”
Section: Semantic Stereo Constraintsmentioning
confidence: 99%
“…Previous studies on wide baseline human performance capture methods [24,33] either use initial sparse reconstruction or visual hulls generated from multi cameras to limit the stereo search space. Other methods for wide baseline semantic reconstruction exploit semantic segmentation constraints to improve the multi-view stereo [23]. In this study, we propose to exploit semantic masking in the stereo matching framework to limit the search region along the Epipolar line to decrease the number of wrong matches from only two camera views.…”
Section: Semantic Stereo Constraintsmentioning
confidence: 99%
“…While correspondence estimation has been extensively studied, there has been a growing trend to extend the idea of matching the same objects across images to matching images covering different instances of an object category. This progress not only attracts substantial attention but also facilitates many real-world applications ranging from object recognition [26], object co-segmentation [2,12,35], to 3D reconstruction [29]. However, due to the presence of background clutter, ambiguity induced by large intra-class variations, and the limited scalability of obtaining large-scale datasets with manually annotated correspondences, semantic matching remains challenging.…”
Section: Introductionmentioning
confidence: 99%
“…Existing multi-task methods for scene understanding perform per frame joint reconstruction and semantic in- stance segmentation from a single image [25], showing that joint estimation can improve each task. Other methods have fused semantic segmentation with reconstruction [36] or flow estimation [42] demonstrating significant improvement in both semantic segmentation and reconstruction/scene flow. We exploit the joint estimation to understand dynamic scenes by simultaneous reconstruction, flow and segmentation estimation from multiple view video.…”
Section: Introductionmentioning
confidence: 99%
“…However methods in these three categories do not exploit semantic information of the scene. The fourth category of joint estimation methods exploit semantic information by introducing joint semantic segmentation and reconstruction for general dynamic scenes [19,56,27,49,36] and street scenes [13,50]. However these methods give perframe semantic segmentation and reconstruction with no motion estimate leading to unaligned geometry and pixel level incoherence in both segmentation and reconstruction for dynamic sequences.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation