Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions

Steenkiste, Sjoerd van; Chang, Michael; Greff, Klaus; Schmidhuber, Jürgen

doi:10.48550/arxiv.1802.10353

Cited by 37 publications

(61 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…OCGMs are typically formulated either as autoencoders (e.g. [7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26]) or generative adversarial networks (GANs) (e.g. [27][28][29][30][31][32][33][34]).…”

Section: Related Workmentioning

confidence: 99%

“…[7][8][9][10][11][12][13]), while others directly infer segmentation masks (e.g. [14][15][16][17][18][19][20][21][22][23]). STNs can explicitly disentangle object location by cropping out a rectangular region from an input, allowing object appearance to be modelled in a canonical pose.…”

Section: Related Workmentioning

confidence: 99%

“…[14,15]) or iterative refinement (e.g. [17][18][19][20][21][22]) to infer object representations from an image. RNN based models need to learn a fixed strategy that sequentially attends to different regions in an image, but this imposes an unnatural ordering on objects in an image.…”

Section: Related Workmentioning

confidence: 99%

“…[7][8][9][10][11][12][13]), others directly predict pixel-wise instance segmentation masks (e.g. [14][15][16][17][18][19][20][21][22]). The latter avoids the use of fixed-size sampling grids which are ill-suited for objects of varying size.…”

Section: Introductionmentioning

confidence: 99%

“…Instead, object representations are inferred either by iteratively refining a set of randomly initialised representations (e.g. [17][18][19][20][21][22]) or by using a recurrent neural networks (RNN) (e.g. [14][15][16]).…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement

Engelcke¹,

Jones²,

Posner³

2021

Preprint

View full text Add to dashboard Cite

Advances in object-centric generative models (OCGMs) have culminated in the development of a broad range of methods for unsupervised object segmentation and interpretable object-centric scene generation. These methods, however, are limited to simulated and real-world datasets with limited visual complexity. Moreover, object representations are often inferred using RNNs which do not scale well to large images or iterative refinement which avoids imposing an unnatural ordering on objects in an image but requires the a priori initialisation of a fixed number of object representations. In contrast to established paradigms, this work proposes an embedding-based approach in which embeddings of pixels are clustered in a differentiable fashion using a stochastic, non-parametric stick-breaking process. Similar to iterative refinement, this clustering procedure also leads to randomly ordered object representations, but without the need of initialising a fixed number of clusters a priori. This is used to develop a new model, GENESIS-V2, which can infer a variable number of object representations without using RNNs or iterative refinement. We show that GENESIS-V2 outperforms previous methods for unsupervised image segmentation and object-centric scene generation on established synthetic datasets as well as more complex real-world datasets.Preprint.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement

Engelcke¹,

Jones²,

Posner³

2021

Preprint

View full text Add to dashboard Cite

show abstract

Latent State Inference in a Spatiotemporal Generative Model

Karlbauer

Menge

Otte

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Knowledge of the hidden factors that determine particular system dynamics is crucial for both explaining them and pursuing goal-directed, interventional actions. The inference of these factors without supervision given time series data remains an open challenge. Here, we focus on spatio-temporal processes, including wave propagations and weather dynamics, and assume that universal causes (e.g. physics) apply throughout space and time. We apply a novel DIstributed, Spatio-Temporal graph Artificial Neural network Architecture, DISTANA, which learns a generative model in such domains. DISTANA requires fewer parameters, and yields more accurate predictions than temporal convolutional neural networks and other related approaches on a 2D circular wave prediction task. We show that DISTANA, when combined with a retrospective latent state inference principle called active tuning, can reliably derive hidden local causal factors. In a current weather prediction benchmark, DISTANA infers our planet's land-sea mask solely by observing temperature dynamics and uses the self inferred information to improve its own prediction of temperature. We are convinced that the retrospective inference of latent states in generative RNN architectures will play an essential role in future research on causal inference and explainable systems. InroductionWhen considering our planet's weather, centuries of past research have identified a large number of factors that affect its highly nonlinear and partially chaotic dynamics. Yet, can we ever be sure of having identified all hidden causal factors? Moreover, do we have (sufficient) data about them? These are fundamental questions in any prediction or forecasting task, including other spatio-temporal tasks such as soil property dynamics, traffic forecasting, energy-flow prediction (e.g in brains or supply networks), or recommender systems. Here we investigate how unobservable hidden causes may be inferred from spatio-temporal data streams.

show abstract

APEX: Unsupervised, Object-Centric Scene Segmentation and Tracking for Robot Manipulation

Jones

Engelcke

et al. 2021

2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

View full text Add to dashboard Cite

Recent advances in unsupervised learning for object detection, segmentation, and tracking hold significant promise for applications in robotics. A common approach is to frame these tasks as inference in probabilistic latent-variable models. In this paper, however, we show that the current state-of-the-art struggles with visually complex scenes such as typically encountered in robot manipulation tasks. We propose APEX, a new latent-variable model which is able to segment and track objects in more realistic scenes featuring objects that vary widely in size and texture, including the robot arm itself. This is achieved by a principled mask normalisation algorithm and a high-resolution scene encoder. To evaluate our approach, we present results on the real-world Sketchy dataset. This dataset, however, does not contain ground truth masks and object IDs for a quantitative evaluation. We thus introduce the Panda Pushing Dataset (P2D) which shows a Panda arm interacting with objects on a table in simulation and which includes groundtruth segmentation masks and object IDs for tracking. In both cases, APEX comprehensively outperforms the current state-ofthe-art in unsupervised object segmentation and tracking. We demonstrate the efficacy of our segmentations for robot skill execution on an object arrangement task, where we also achieve the best or comparable performance among all the baselines.

show abstract

Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions

Cited by 37 publications

References 30 publications

GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement

GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement

Latent State Inference in a Spatiotemporal Generative Model

APEX: Unsupervised, Object-Centric Scene Segmentation and Tracking for Robot Manipulation

Contact Info

Product

Resources

About