Unsupervised Learning of Invariant Representations in Hierarchical Architectures

Anselmi, Fabio; Leibo, Joel Z.; Rosasco, Lorenzo; Mutch, Jim; Tacchetti, Andrea; Poggio, Tomaso

doi:10.48550/arxiv.1311.4158

Cited by 22 publications

(58 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Hierarchical unsupervised architectures have been shown to provide efficient learning strategies [1]. We show that unsupervised learning can optimize an average discriminability by computing sparse features.…”

Section: Introductionmentioning

confidence: 95%

Deep Haar scattering networks

Cheng

Chen²,

Mallat³

2016

Information and Inference

View full text Add to dashboard Cite

An orthogonal Haar scattering transform is a deep network, computed with a hierarchy of additions, subtractions and absolute values, over pairs of coefficients. It provides a simple mathematical model for unsupervised deep network learning. It implements non-linear contractions, which are optimized for classification, with an unsupervised pair matching algorithm, of polynomial complexity. A structured Haar scattering over graph data computes permutation invariant representations of groups of connected points in the graph. If the graph connectivity is unknown, unsupervised Haar pair learning can provide a consistent estimation of connected dyadic groups of points. Classification results are given on image data bases, defined on regular grids or graphs, with a connectivity which may be known or unknown. deep learning, neural network, scattering transform, Haar wavelet, classification, images, graphs 2000 Math Subject Classification: 68Q32, 68T45, 68Q25, 68T05 arXiv:1509.09187v1 [cs.LG] 30 Sep 2015 on general graphs. In social, sensor or transportation networks, high dimensional data vectors are supported on a graph [28]. In most cases, propagation phenomena require to define translation invariant representations for classification. We show that an appropriate configuration of an orthogonal Haar scattering defines such a translation invariant representation on a graph. It is computed with a product of Haar wavelet transforms on the graph, and is thus closely related to nonorthogonal translation invariant scattering transforms [18].The connectivity of graph data is often unknown. In social or financial networks, we typically have information on individual agents, without knowing the interactions and hence connectivity between agents. Building invariant representations on such graphs requires to estimate the graph connectivity. Such information can be inferred from unlabeled data, by analyzing the joint variability of signals defined on the unknown graph. This paper studies unsupervised learning strategies, which optimize deep network configurations, without class label information.Most deep neural networks are fighting the curse of dimensionality by reducing the variance of the input data with contractive non-linearities [25,2]. The danger of such contractions is to nearly collapse together vectors which belong to different classes. Learning must preserve discriminability despite this variance reduction resulting from contractions. Hierarchical unsupervised architectures have been shown to provide efficient learning strategies [1]. We show that unsupervised learning can optimize an average discriminability by computing sparse features. Sparse unsupervised learning, which is usually NP hard, is reduced to a pair matching problem for Haar scattering. It can thus be computed with a polynomial complexity algorithm. For Haar scattering on graphs, it recovers a hierarchical connectivity of groups of vertices. Under appropriate assumptions, we prove that pairing problems avoid the curse of dimensionality. It can recover an ...

show abstract

Section: Introductionmentioning

confidence: 95%

Deep Haar scattering networks

Cheng

Chen²,

Mallat³

2016

Information and Inference

View full text Add to dashboard Cite

show abstract

“…The earliest attempt at theoretical justification for invariance of which we are aware is [1], which roughly states that enforcing invariance cannot increase the VC dimension of a model. Anselmi et al [2] and Mroueh, Voinea, and Poggio [23] propose heuristic arguments for improved sample complexity of invariant models. Sokolic et al [29] build on the work of Xu and Mannor [34] to obtain a generalisation bound for certain types of classifiers that are invariant to a finite set of transformations, while Sannai and Imaizumi [25] obtain a bound for models that are invariant to finite permutation groups.…”

Section: Related Workmentioning

confidence: 99%

Provably Strict Generalisation Benefit for Invariance in Kernel Methods

Elesedy¹

2021

Preprint

View full text Add to dashboard Cite

It is a commonly held belief that enforcing invariance improves generalisation. Although this approach enjoys widespread popularity, it is only very recently that a rigorous theoretical demonstration of this benefit has been established. In this work we build on the function space perspective of Elesedy and Zaidi [8] to derive a strictly non-zero generalisation benefit of incorporating invariance in kernel ridge regression when the target is invariant to the action of a compact group. We study invariance enforced by feature averaging and find that generalisation is governed by a notion of effective dimension that arises from the interplay between the kernel and the group. In building towards this result, we find that the action of the group induces an orthogonal decomposition of both the reproducing kernel Hilbert space and its kernel, which may be of interest in its own right.Preprint. Under review.

show abstract

“…The manifold structures may be known from prior knowledge, or could be estimated from data using a variety of manifold learning algorithms [42][43][44][45][46][47] . Based upon knowledge of these structures, some areas of prior research have focused on building invariant representations 48 or constructing invariant metrics 49 . On the other hand, most approaches today rely upon data augmentation by explicitly generating "virtual" examples from these manifolds 50,51 .…”

Section: Introductionmentioning

confidence: 99%

Statistical Mechanics of Neural Processing of Object Manifolds

Chung

2021

Preprint

View full text Add to dashboard Cite

Invariant object recognition is one of the most fundamental cognitive tasks performed by the brain. In the neural state space, different objects with stimulus variabilities are represented as different manifolds. In this geometrical perspective, object recognition becomes the problem of linearly separating different object manifolds. In feedforward visual hierarchy, it has been suggested that the object manifold representations are reformatted across the layers, to become more linearly separable. Thus, a complete theory of perception requires characterizing the ability of linear readout networks to classify object manifolds from variable neural responses.A theoretical understanding of the perceptron of isolated points was pioneered by Elizabeth Gardner who formulated it as a statistical mechanics problem and analyzed it using replica theory. In this thesis, we generalize the statistical mechanical analysis and establish a theory of linear classification of manifolds synthesizing statistical and geometric properties of high dimensional signals.First, we study the theory of linear classification of simple spherical manifolds, such as line segments, L 2 balls, or L 1 balls. We provide analytical formula for classification capacity of balls, as a function of dimension, radius, and margin. We also find that the notion of support vectors needs to be generalized, and identify different support configurations of the manifolds, which has implications in generalization error.Next, we present a Maximum Margin Manifold Machine (M 4 ), an efficient iterative algorithm that can find a maximum margin linear binary classifier for manifolds with iii Thesis advisor: Haim Sompolinsky SueYeon Chung an uncountable set of training samples per each manifold. We provide a convergence proof with a polynomial bound on the convergence time. We further generalize M 4 for non-separable manifolds with slack variables. We report that the number of training examples required to achieve the same generalization error is much smaller for M 4 , compared with traditional support vector machines.Next, we generalize our theory further to linear classification of random general manifolds. We start with classification capacity of random ellipsoids, and generalize to classification capacity of general smooth and non-smooth manifolds. We identify that the capacity of a manifold is determined that effective radius, R M , and effective dimension,Finally, we show extensions to directions relevant for applications to real data. We have extended our general manifold classification theory to incorporate correlated manifolds, mixtures of manifold geometries, sparse labels and nonlinear classifications. Then, we analyze how object-based manifolds reformat in a conventional deep network (GoogLeNet).We find that the deep network indeed changes the manifolds in the direction that the capacity is increased. This thesis lays the groundwork for a computational theory of neuronal processing of objects, providing quantitative measures for linear separability of object ma...

show abstract

Unsupervised Learning of Invariant Representations in Hierarchical Architectures

Cited by 22 publications

References 0 publications

Deep Haar scattering networks

Deep Haar scattering networks

Provably Strict Generalisation Benefit for Invariance in Kernel Methods

Statistical Mechanics of Neural Processing of Object Manifolds

Contact Info

Product

Resources

About