Overcoming Occlusion with Inverse Graphics

Moreno, Pol; Williams, Christopher K. I.; Nash, Charlie; Kohli, Pushmeet

doi:10.1007/978-3-319-49409-8_16

Cited by 28 publications

(19 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Analysis-by-synthesis approaches to computer vision A long line of work has interpreted computer vision as the inverse problem to computer graphics [25,45,30,26]. This 'analysis-by-synthesis' approach has been used for various tasks including character recognition, CAPTCHA-breaking, lane detection, object pose estimation, and human pose estimation [46,41,31,34,21,35]. To our knowledge, our work is the first to use an analysis-by-synthesis approach to infer a hierarchical 3D object-based representation of real multi-object scenes while exploiting inductive biases about the contacts between objects.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

3DP3: 3D Scene Perception via Probabilistic Programming

Gothoskar¹,

Cusumano-Towner²,

Zinberg³

et al. 2021

Preprint

View full text Add to dashboard Cite

We present 3DP3, a framework for inverse graphics that uses inference in a structured generative model of objects, scenes, and images. 3DP3 uses (i) voxel models to represent the 3D shape of objects, (ii) hierarchical scene graphs to decompose scenes into objects and the contacts between them, and (iii) depth image likelihoods based on real-time graphics. Given an observed RGB-D image, 3DP3's inference algorithm infers the underlying latent 3D scene, including the object poses and a parsimonious joint parametrization of these poses, using fast bottom-up pose proposals, novel involutive MCMC updates of the scene graph structure, and, optionally, neural object detectors and pose estimators. We show that 3DP3 enables scene understanding that is aware of 3D shape, occlusion, and contact structure. Our results demonstrate that 3DP3 is more accurate at 6DoF object pose estimation from real images than deep learning baselines and shows better generalization to challenging scenes with novel viewpoints, contact, and partial observability.35th Conference on Neural Information Processing Systems (NeurIPS 2021).

show abstract

Section: Related Workmentioning

confidence: 99%

“…where i indexes pixels of the depth image. Pixels whose ray does not intersect an object are assigned the maximum depth value D. A similar likelihood function on depth images was used in [34].…”

Section: Parsing Scenes With Fully Occluded Objects and Number Uncert...mentioning

confidence: 99%

3DP3: 3D Scene Perception via Probabilistic Programming

Gothoskar¹,

Cusumano-Towner²,

Zinberg³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Zienkiewicz et al [12] used render-and-compare for real-time height mapping fusion. Several recent works used render-and-compare for solving a wide range of vision problems: Tewari et al [13] learned unsupervised monocular face reconstruction; Kundu et al [14] introduced a framework for instance-level 3D scene understanding; Moreno et al [15] estimated 6D object pose in cluttered synthetic scenes. More closely related is the DeepIM method by Li et al [16], who formulated 6D object pose estimation as an iterative pose refinement process that refines the initial pose by trying to match the rendered image with the observed image.…”

Section: Related Workmentioning

confidence: 99%

Refining 6D Object Pose Predictions using Abstract Render-and-Compare

Periyasamy

Schwarz

Behnke

2019

2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids)

View full text Add to dashboard Cite

Robotic systems often require precise scene analysis capabilities, especially in unstructured, cluttered situations, as occurring in human-made environments. While current deeplearning based methods yield good estimates of object poses, they often struggle with large amounts of occlusion and do not take inter-object effects into account. Vision as inverse graphics is a promising concept for detailed scene analysis.A key element for this idea is a method for inferring scene parameter updates from the rasterized 2D scene. However, the rasterization process is notoriously difficult to invert, both due to the projection and occlusion process, but also due to secondary effects such as lighting or reflections. We propose to remove the latter from the process by mapping the rasterized image into an abstract feature space learned in a self-supervised way from pixel correspondences. Using only a light-weight inverse rendering module, this allows us to refine 6D object pose estimations in highly cluttered scenes by optimizing a simple pixel-wise difference in the abstract image representation. We evaluate our approach on the challenging YCB-Video dataset, where it yields large improvements and demonstrates a large basin of attraction towards the correct object poses. Abstract Scene AO Abstraction ModuleDifferentiable Renderer Abstraction ModulePose P Loss L

show abstract

“…It involves a considerably more complicated simulator that takes as input a set of 20 parameters and deterministically renders an image of a object (in this case, a teapot) on a uniform background. This is based on the generative model used by Moreno et al (2016). We focus on the task of learning the posterior distribution of two colour parameters in a setting where there are two possible explanations for the observed image and thus the posterior is expected to be bi-modal.…”

Section: Experiments 3: Gradient-free Romcmentioning

confidence: 99%

“…• Five illumination parameters that characterise the lighting on the object. Unlike Moreno et al (2016) who use spherical harmonics to model illumination, we use single-source directional lighting as it is more intuitive and natural.…”

Section: F Additional Information For Expmentioning

confidence: 99%

Robust Optimisation Monte Carlo

Ikonomov¹,

Gutmann²

2019

Preprint

View full text Add to dashboard Cite

This paper is on Bayesian inference for parametric statistical models that are defined by a stochastic simulator which specifies how data is generated. Exact sampling is then possible but evaluating the likelihood function is typically prohibitively expensive. Approximate Bayesian Computation (ABC) is a framework to perform approximate inference in such situations. While basic ABC algorithms are widely applicable, they are notoriously slow and much research has focused on increasing their efficiency. Optimisation Monte Carlo (OMC) has recently been proposed as an efficient and embarrassingly parallel method that leverages optimisation to accelerate the inference. In this paper, we demonstrate an important previously unrecognised failure mode of OMC: It generates strongly overconfident approximations by collapsing regions of similar or near-constant likelihood into a single point. We propose an efficient, robust generalisation of OMC that corrects this. It makes fewer assumptions, retains the main benefits of OMC, and can be performed either as post-processing to OMC or as a stand-alone computation. We demonstrate the effectiveness of the proposed Robust OMC on toy examples and tasks in inverse-graphics where we perform Bayesian inference with a complex image renderer.

show abstract

Overcoming Occlusion with Inverse Graphics

Cited by 28 publications

References 30 publications

3DP3: 3D Scene Perception via Probabilistic Programming

3DP3: 3D Scene Perception via Probabilistic Programming

Refining 6D Object Pose Predictions using Abstract Render-and-Compare

Robust Optimisation Monte Carlo

Contact Info

Product

Resources

About