2011 International Conference on Computer Vision 2011
DOI: 10.1109/iccv.2011.6126233
|View full text |Cite
|
Sign up to set email alerts
|

Real-time indoor scene understanding using Bayesian filtering with motion cues

Abstract: We present a method whereby an embodied agent using visual perception can efficiently create a model of a local indoor environment from its experience of moving within it. Our method uses motion cues to compute likelihoods of indoor structure hypotheses, based on simple, generic geometric knowledge about points, lines, planes, and motion. We present a single-image analysis, not to attempt to identify a single accurate model, but to propose a set of plausible hypotheses about the structure of the environment fr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
53
0

Year Published

2012
2012
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 59 publications
(53 citation statements)
references
References 24 publications
0
53
0
Order By: Relevance
“…This is because both proposed point sampling techniques ensure that points in each sample are chosen with high likelihood from the same planar surface, and that only one model per sample is built for the Table 1. ARI results (larger is better) on sequences from three different data sets: the Michigan indoor (Mich:) data set [31], the MichiganMilan (MM:) data set [7], and a new data set collected in our campus (New:). The algorithms considered are: T-linkage (T-L), Manhattan World-constrained T-linkage (MW), Weak Manhattan World-constrained T-linkage (WMW) [26], MW with samples constrained by the orientation map (MW-OM), and MW with samples from regions grown around each point (MW-RS).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…This is because both proposed point sampling techniques ensure that points in each sample are chosen with high likelihood from the same planar surface, and that only one model per sample is built for the Table 1. ARI results (larger is better) on sequences from three different data sets: the Michigan indoor (Mich:) data set [31], the MichiganMilan (MM:) data set [7], and a new data set collected in our campus (New:). The algorithms considered are: T-linkage (T-L), Manhattan World-constrained T-linkage (MW), Weak Manhattan World-constrained T-linkage (WMW) [26], MW with samples constrained by the orientation map (MW-OM), and MW with samples from regions grown around each point (MW-RS).…”
Section: Resultsmentioning
confidence: 99%
“…Images in this data set contain an average number of 2.5 planes visible, with 36 feature points per plane on average. (2) The Michigan indoor data set [31], for which camera calibration was available (although substantial residual radial distortion had to be removed via manual calibration). 4.5 planar surfaces are visible in each image on average, and 33 feature points were detected per plane on average.…”
Section: Data Setsmentioning
confidence: 99%
“…[3] already demonstrated that 3D superpixels can be reconstructed in realtime. [12,36] estimated a multiview layoutwithout labeling the image-in real-time. Finally, although there is no experimental evidence of real-time for Data-Driven Primitives, it consists of HOG features extraction and SVM classification.…”
Section: Resultsmentioning
confidence: 99%
“…[14,16,37] used the Manhattan assumption to fill textureless gaps in sparse 3D reconstructions. [25,3,4] and [12,36] have used super-pixels and indoor scene understanding respectively to fill textureless gaps in sparse 3D reconstructions. Our contribution is to fuse the previously mentioned cues and a new onedata-driven primitives-in a dense variational formulation.…”
Section: Data-driven Depth Cuesmentioning
confidence: 99%
“…For on-line mobile agent that perceives its local environment through a temporally continuous stream of images (e.g. a video), Tsai, et al [11] generates a set of hypotheses from the first frame of the video, and uses a Bayesian filter to evaluate the hypotheses on-line based on their abilities to explain the 2D motions of a set of tracked features. Tsai and Kuipers [10] extended the real-time scene understanding method to generate children hypotheses on-line from existing hypotheses to describe the scene in more detail.…”
Section: Introductionmentioning
confidence: 99%