On the Binding Problem in Artificial Neural Networks

Greff, Klaus; Steenkiste, Sjoerd van; Schmidhuber, Jürgen

doi:10.48550/arxiv.2012.05208

Cited by 69 publications

(99 citation statements)

References 168 publications

(200 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Systematic generalization (Fodor et al, 1988) is a desired property for neural networks to extrapolate compositional rules seen during training beyond training distribution: for example, performing different combinations of known rules or applying them to longer problems. Despite the progress of artificial neural networks in recent years, the problem of systematic generalization still remains unsolved (Fodor and McLaughlin, 1990;Lake and Baroni, 2018;Liska et al, 2018;Greff et al, 2020;Hupkes et al, 2020). While there has been much progress in the past years (Bahdanau et al, 2019;Korrel et al, 2019;Lake, 2019;Li et al, 2019;Russin et al, 2019), in particular on the popular SCAN dataset (Lake and where some methods even achieve 100% accuracy by introducing some non-trivial symbolic components into the system Liu et al, 2020), the flexibility of such solutions is questionable.…”

Section: Introductionmentioning

confidence: 99%

The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers

Csordás¹,

Irie²,

Schmidhuber³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Recently, many datasets have been proposed to test the systematic generalization ability of neural networks. The companion baseline Transformers, typically trained with default hyper-parameters from standard tasks, are shown to fail dramatically. Here we demonstrate that by revisiting model configurations as basic as scaling of embeddings, early stopping, relative positional embedding, and Universal Transformer variants, we can drastically improve the performance of Transformers on systematic generalization. We report improvements on five popular datasets: SCAN, CFQ, PCFG, COGS, and Mathematics dataset. Our models improve accuracy from 50% to 85% on the PCFG productivity split, and from 35% to 81% on COGS. On SCAN, relative positional embedding largely mitigates the EOS decision problem (Newman et al., 2020), yielding 100% accuracy on the length split with a cutoff at 26. Importantly, performance differences between these models are typically invisible on the IID data split. This calls for proper generalization validation sets for developing neural networks that generalize systematically. We publicly release the code to reproduce our results 1 .

show abstract

Section: Introductionmentioning

confidence: 99%

The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers

Csordás¹,

Irie²,

Schmidhuber³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Object-centric models ("Slots" and "Capsules"): Objects are how people interact with the world and are therefore central to human scene understanding [Scholl, 2001, Spelke, 1990. Visual objects are formed by (bottom-up) part-whole matching and Gestalt processes interacting with (top-down) objectness priors and knowledge of object categories [Greff et al, 2020, Vecera, 2000, Wagemans et al, 2012. Object-centric models use these processes to discover objects and segregate their representations into different "slots" [Greff et al, 2020, Goyal et al, 2019.…”

Section: Models With Recurrent and Feedback Connectionsmentioning

confidence: 99%

“…Visual objects are formed by (bottom-up) part-whole matching and Gestalt processes interacting with (top-down) objectness priors and knowledge of object categories [Greff et al, 2020, Vecera, 2000, Wagemans et al, 2012. Object-centric models use these processes to discover objects and segregate their representations into different "slots" [Greff et al, 2020, Goyal et al, 2019. Attention mechanisms have played a major role in object-centric models by enabling the iterative discovery and representation of an object's properties.…”

Section: Models With Recurrent and Feedback Connectionsmentioning

confidence: 99%

Recurrent Attention Models with Object-centric Capsule Representation for Multi-object Recognition

Adeli¹,

Ahn²,

Zelinsky³

2021

Preprint

View full text Add to dashboard Cite

The visual system processes a scene using a sequence of selective glimpses, each driven by spatial and object-based attention. These glimpses reflect what is relevant to the ongoing task and are selected through recurrent processing and recognition of the objects in the scene. In contrast, most models treat attention selection and recognition as separate stages in a feedforward process. Here we show that using capsule networks to create an object-centric hidden representation in an encoder-decoder model with iterative glimpse attention yields effective integration of attention and recognition. We evaluate our model on three multi-object recognition tasks; highly overlapping digits, digits among distracting clutter and house numbers, and show that it learns to effectively move its glimpse window, recognize and reconstruct the objects, all with only the classification as supervision. Our work takes a step toward a general architecture for how to integrate recurrent object-centric representation into the planning of attentional glimpses.

show abstract

“…and is crucial for generalizing in predictable and systematic ways. Object-centric representations have the potential to greatly improve sample efficiency, robustness, generalization to new tasks, and interpretability of machine learning algorithms (Greff et al, 2020). In this work, we focus on the aspect of modeling motion of objects from video, because of its synergistic relationship with object-centric representations: On the one hand, objects support learning an efficient dynamics model by factorizing the scene into approximately independent parts with only sparse interactions.…”

Section: Introductionmentioning

confidence: 99%

Conditional Object-Centric Learning from Video

Kipf¹,

Elsayed²,

Mahendran³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Object-centric representations are a promising path toward more systematic generalization by providing flexible abstractions upon which compositional world models can be built. Recent work on simple 2D and 3D datasets has shown that models with object-centric inductive biases can learn to segment and represent meaningful objects from the statistical structure of the data alone without the need for any supervision. However, such fully-unsupervised methods still fail to scale to diverse realistic data, despite the use of increasingly complex inductive biases such as priors for the size of objects or the 3D geometry of the scene. In this paper, we instead take a weakly-supervised approach and focus on how 1) using the temporal dynamics of video data in the form of optical flow and 2) conditioning the model on simple object location cues can be used to enable segmenting and tracking objects in significantly more realistic synthetic data. We introduce a sequential extension to Slot Attention which we train to predict optical flow for realistic looking synthetic scenes and show that conditioning the initial state of this model on a small set of hints, such as center of mass of objects in the first frame, is sufficient to significantly improve instance segmentation. These benefits generalize beyond the training distribution to novel objects, novel backgrounds, and to longer video sequences. We also find that such initial-state-conditioning can be used during inference as a flexible interface to query the model for specific objects or parts of objects, which could pave the way for a range of weakly-supervised approaches and allow more effective interaction with trained models. Project page: https://slot-attention-video.github.io/ * Equal contribution.

show abstract

On the Binding Problem in Artificial Neural Networks

Cited by 69 publications

References 168 publications

The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers

The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers

Recurrent Attention Models with Object-centric Capsule Representation for Multi-object Recognition

Conditional Object-Centric Learning from Video

Contact Info

Product

Resources

About