Stacked Capsule Autoencoders

Kosiorek, Adam R.; Sabour, Sara; Teh, Yee Whye; Hinton, Geoffrey E.

doi:10.48550/arxiv.1906.06818

Cited by 8 publications

(7 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To improve biological plausibility, all computations in our model are local and all units are connected to the same small, local set of other units throughout learning and inference, which matches early visual cortex, in which the lateral connections that follow natural image statistics are implemented anatomically [4,26,48,59]. This in contrast to other ideas that require flexible pointers to arbitrary locations and features [as discussed by 53] or capsules that flexibly encode different parts of the input [9,36,50,51]. Nonetheless, we employ contrastive learning objectives and backpropagation here, for which we do not provide a biologically plausible implementations.…”

Section: Discussionmentioning

confidence: 90%

Unsupervised learning of features and object boundaries from local prediction

Schütt¹,

Ma²

2022

Preprint

View full text Add to dashboard Cite

A visual system has to learn both which features to extract from images and how to group locations into (proto-)objects. Those two aspects are usually dealt with separately, although predictability is discussed as a cue for both. To incorporate features and boundaries into the same model, we model a layer of feature maps with a pairwise Markov random field model in which each factor is paired with an additional binary variable, which switches the factor on or off. Using one of two contrastive learning objectives, we can learn both the features and the parameters of the Markov random field factors from images without further supervision signals.The features learned by shallow neural networks based on this loss are local averages, opponent colors, and Gabor-like stripe patterns. Furthermore, we can infer connectivity between locations by inferring the switch variables. Contours inferred from this connectivity perform quite well on the Berkeley segmentation database (BSDS500) without any training on contours. Thus, computing predictions across space aids both segmentation and feature learning, and models trained to optimize these predictions show similarities to the human visual system. We speculate that retinotopic visual cortex might implement such predictions over space through lateral connections. * Use footnote for providing further information about author (webpage, alternative address)-not for acknowledging funding agencies.Preprint. Under review.

show abstract

Section: Discussionmentioning

confidence: 90%

Unsupervised learning of features and object boundaries from local prediction

Schütt¹,

Ma²

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…A small body of work focuses on developing equivariant autoencoders. Several methods construct data and group-specific architectures to auto-encode data equivariantly, learning an equivariant representation in the process (Hinton et al, 2011;Kosiorek et al, 2019). Others use supervision to extract class-invariant and class-equivariant representations (Feige, 2022).…”

Section: Equivariant Representations Of Atomic Systemsmentioning

confidence: 99%

Holographic-(V)AE: an end-to-end SO(3)-Equivariant (Variational) Autoencoder in Fourier Space

Visani

Pun

Angaji³

et al. 2022

Preprint

View full text Add to dashboard Cite

Group-equivariant neural networks have emerged as a data-efficient approach to solve classification and regression tasks, while respecting the relevant symmetries of the data. However, little work has been done to extend this paradigm to the unsupervised and generative domains. Here, we present Holographic-(V)AE (H-(V)AE), a fully end-to-end SO(3)-equivariant (variational) autoencoder in Fourier space, suitable for unsupervised learning and generation of data distributed around a specified origin. H-(V)AE is trained to reconstruct the spherical Fourier encoding of data, learning in the process a latent space with a maximally informative invariant embedding alongside an equivariant frame describing the orientation of the data. We extensively test the performance of H-(V)AE on diverse datasets and show that its latent space efficiently encodes the categorical features of spherical images and structural features of protein atomic environments. Our work can further be seen as a case study for equivariant modeling of a data distribution by reconstructing its Fourier encoding.

show abstract

“…Capsule Network is first proposed in [22] and is improved in [7] and [12], which is designed for image features extraction. In general, Capsule Network can not only effectively fuse information from numerous elements into highly expressive representations without information loss, but also reveal the contributions from different elements to the representations by routing mechanism.…”

Section: Capsule Networkmentioning

confidence: 99%

Graph Capsule Aggregation for Unaligned Multimodal Sequences

Mai

2021

Proceedings of the 2021 International Conference on Multimodal Interaction

View full text Add to dashboard Cite

Humans express their opinions and emotions through multiple modalities which mainly consist of textual, acoustic and visual modalities. Prior works on multimodal sentiment analysis mostly apply Recurrent Neural Network (RNN) to model aligned multimodal sequences. However, it is unpractical to align multimodal sequences due to different sample rates for different modalities. Moreover, RNN is prone to the issues of gradient vanishing or exploding and it has limited capacity of learning long-range dependency which is the major obstacle to model unaligned multimodal sequences. In this paper, we introduce Graph Capsule Aggregation (GraphCAGE) to model unaligned multimodal sequences with graph-based neural model and Capsule Network. By converting sequence data into graph, the previously mentioned problems of RNN are avoided. In addition, the aggregation capability of Capsule Network and the graph-based structure enable our model to be interpretable and better solve the problem of longrange dependency. Experimental results suggest that GraphCAGE achieves state-of-the-art performance on two benchmark datasets with representations refined by Capsule Network and interpretation provided. CCS CONCEPTS• Information systems → Multimedia streaming.

show abstract

Stacked Capsule Autoencoders

Cited by 8 publications

References 17 publications

Unsupervised learning of features and object boundaries from local prediction

Unsupervised learning of features and object boundaries from local prediction

Holographic-(V)AE: an end-to-end SO(3)-Equivariant (Variational) Autoencoder in Fourier Space

Graph Capsule Aggregation for Unaligned Multimodal Sequences

Contact Info

Product

Resources

About