2016
DOI: 10.48550/arxiv.1606.06724
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Tagger: Deep Unsupervised Perceptual Grouping

Abstract: We present a framework for efficient perceptual inference that explicitly reasons about the segmentation of its inputs and features. Rather than being trained for any specific segmentation, our framework learns the grouping process in an unsupervised manner or alongside any supervised task. We enable a neural network to group the representations of different objects in an iterative manner through a differentiable mechanism. We achieve very fast convergence by allowing the system to amortize the joint iterative… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
10
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(10 citation statements)
references
References 30 publications
0
10
0
Order By: Relevance
“…This has been recently achieved with InfoGANs (Chen et al, 2016a), where structured latent variables are included as part of the noise vector, and the mutual information between these latent variables and the generator distribution is then maximised as a mini-max game between the two networks. Similarly, Tagger (Greff et al, 2016), which combines iterative amortized grouping and ladder networks, aims to perceptually group objects in images by iteratively denoising its inputs and assigning parts of the reconstruction to different groups. introduced a way to combine amortized inference with stochastic variational inference in an algorithm called structured VAEs.…”
Section: Related Workmentioning
confidence: 99%
“…This has been recently achieved with InfoGANs (Chen et al, 2016a), where structured latent variables are included as part of the noise vector, and the mutual information between these latent variables and the generator distribution is then maximised as a mini-max game between the two networks. Similarly, Tagger (Greff et al, 2016), which combines iterative amortized grouping and ladder networks, aims to perceptually group objects in images by iteratively denoising its inputs and assigning parts of the reconstruction to different groups. introduced a way to combine amortized inference with stochastic variational inference in an algorithm called structured VAEs.…”
Section: Related Workmentioning
confidence: 99%
“…Deep neural networks in particular have proven to be remarkably effective for supervised learning from large datasets using backpropagation [1,2]. Deep learning is therefore already a viable solution to the symbol grounding problem in the supervised case, and for the unsupervised case, which is essential for a full solution, rapid progress is being made [20,21,22,23,24]. The hybrid neuralsymbolic reinforcement learning architecture we propose relies on a deep learning solution to the symbol grounding problem.…”
Section: Introductionmentioning
confidence: 99%
“…Our method is also closely related to recent work on deep probabilistic inference for scene decomposition. Most works formulate the problem as compositional generative models, in which a visual scene is represented by a set of latent codes that either correspond to localized object-centric patches [11,8,24,28,19] or scene mixture components [2,15,16,17,10]. The scene mixture models generate full-sized images for each latent code and blend them via attentional masks [2] in iterative variational inference frameworks.…”
Section: Related Workmentioning
confidence: 99%