Towards Understanding the Invertibility of Convolutional Neural Networks

Gilbert, Anna C.; Zhang, Yi; Lee, Kibok; Zhang, Yuting; Lee, Honglak

doi:10.24963/ijcai.2017/236

Cited by 44 publications

(27 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Autoencoders with more than one hidden layer have been used for unsupervised feature learning [22] and recently there has been an analysis of the sparse coding performance of convolutional neural networks with one layer [20] and two layers of nonlinearities [39]. The connections between neural networks and sparse coding has also been recently explored in [14].…”

Section: Resultsmentioning

confidence: 99%

Sparse Coding and Autoencoders

Rangamani

Mukherjee

Basu

et al. 2018

2018 IEEE International Symposium on Information Theory (ISIT)

View full text Add to dashboard Cite

In Dictionary Learning one tries to recover incoherent matrices A * ∈ R n×h (typically overcomplete and whose columns are assumed to be normalized) and sparse vectors x * ∈ R h with a small support of size h p for some 0 < p < 1 while having access to observations y ∈ R n where y = A * x * . In this work we undertake a rigorous analysis of whether gradient descent on the squared loss of an autoencoder can solve the dictionary learning problem. The Autoencoder architecture we consider is a R n → R n mapping with a single ReLU activation layer of size h.Under very mild distributional assumptions on x * , we prove that the norm of the expected gradient of the standard squared loss function is asymptotically (in sparse code dimension) negligible for all points in a small neighborhood of A * . This is supported with experimental evidence using synthetic data. We also conduct experiments to suggest that A * is a local minimum. Along the way we prove that a layer of ReLU gates can be set up to automatically recover the support of the sparse codes. This property holds independent of the loss function. We believe that it could be of independent interest. * Equal Contribution

show abstract

Section: Resultsmentioning

confidence: 99%

Sparse Coding and Autoencoders

Rangamani

Mukherjee

Basu

et al. 2018

2018 IEEE International Symposium on Information Theory (ISIT)

View full text Add to dashboard Cite

show abstract

“…We believe that this alternate form of generative model, one based on calculating a transport map that is parameterized over the space of polynomial basis functions orthogonal to the distribution of the data, stands in contrast to the black-box nature of neural networks. Moreover, although certain works have explored the invertibility of deep neural networks (Lipton & Tripathi, 2017), (Gilbert, Zhang, Lee, Zhang, & Lee, 2017), in general a single output of a neural network might map to multiple latent vectors. Our transport maps, chosen over the space of diffeomorphisms, remain necessarily invertible and indeed this property is exploited in the generation of samples.…”

Section: Discussionmentioning

confidence: 99%

A Distributed Framework for the Construction of Transport Maps

et al. 2019

View full text Add to dashboard Cite

The need to reason about uncertainty in large, complex, and multi-modal datasets has become increasingly common across modern scientific environments. The ability to transform samples from one distribution P to another distribution Q enables the solution to many problems in machine learning (e.g. Bayesian inference, generative modeling) and has been actively pursued from theoretical, computational, and application perspectives across the fields of information theory, computer science, and biology. Performing such transformations, in general, still leads to computational difficulties, especially in high dimensions. Here, we consider the problem of computing such "measure transport maps" with efficient and parallelizable methods. Under the mild assumptions that P need not be known but can be sampled from, and that the density of Q is known up to a proportionality constant, and that Q is log-concave, we provide in this work a convex optimization problem pertaining to relative entropy minimization. We show how an empirical minimization formulation and polynomial chaos map parameterization can allow for learning a transport map between P and Q with distributed and scalable methods. We also leverage findings from nonequilibrium thermodynamics to represent the transport map as a composition of simpler maps, each of which is learned sequentially with a transport cost regularized version of the aforementioned problem formulation. We provide examples of our framework within the context of Bayesian inference for the Boston housing dataset and generative modeling for handwritten digit images from the MNIST dataset.

show abstract

“…In this section we illustrate the non-linear LRIP on a simple example; that of recovering a vector from a random features embedding, which is a random map initially designed for kernel approximation, see [23,24]. Such a random embedding can be seen as a one-layer neural network with random weights, for which invertibility and preservation of information have recently been topics of interest [16,17]. Consider E = R d and define S to be a Union of Subspaces, which is a popular model in compressed sensing [5], with controlled norm: [18], we choose a sampling that is a reweighted version of the original Fourier sampling for kernel approximation [23], for the Gaussian kernel with bandwidth σ > 0.…”

Section: Illustrationmentioning

confidence: 99%

Instance Optimal Decoding and the Restricted Isometry Property

Keriven

Gribonval

2018

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

In this paper, we address the question of information preservation in ill-posed, non-linear inverse problems, assuming that the measured data is close to a low-dimensional model set. We provide necessary and sufficient conditions for the existence of a so-called instance optimal decoder, i.e., that is robust to noise and modelling error. Inspired by existing results in compressive sensing, our analysis is based on a (Lower) Restricted Isometry Property (LRIP), formulated in a non-linear fashion. We also provide sufficient conditions for non-uniform recovery with random measurement operators, with a new formulation of the LRIP. We finish by describing typical strategies to prove the LRIP in both linear and non-linear cases, and illustrate our results by studying the invertibility of a one-layer neural net with random weights.1 A pseudometric d satisfies all the requirements of a metric except d(x, y) = 0 ⇒ x = y. 2 Similarly, a seminorm satisfy the requirements of a norm except that x = 0 does not imply x = 0.

show abstract

Towards Understanding the Invertibility of Convolutional Neural Networks

Cited by 44 publications

References 1 publication

Sparse Coding and Autoencoders

Sparse Coding and Autoencoders

A Distributed Framework for the Construction of Transport Maps

Instance Optimal Decoding and the Restricted Isometry Property

Contact Info

Product

Resources

About