We formally map the problem of sampling from an unknown distribution with density p X in R d to the problem of learning and sampling p Y in R M d obtained by convolving p X with a fixed factorial kernel: p Y is referred to as M-density and the factorial kernel as multimeasurement noise model (MNM). The M-density is smoother than p X , easier to learn and sample from, yet for large M the two problems are mathematically equivalent since X can be estimated exactly given Y = y using the Bayes estimator x(y) = E[X|Y = y]. To formulate the problem, we derive x(y) for Poisson and Gaussian MNMs expressed in closed form in terms of unnormalized p Y . This leads to a simple least-squares objective for learning parametric energy and score functions. We present various parametrization schemes of interest, including one in which studying Gaussian M-densities directly leads to multidenoising autoencoders-this is the first theoretical connection made between denoising autoencoders and empirical Bayes in the literature. Samples from p X are obtained by walk-jump sampling (Saremi & Hyvärinen, 2019) via underdamped Langevin MCMC (walk) to sample from p Y and the multimeasurement Bayes estimation of X (jump). We study permutation invariant Gaussian M-densities on MNIST, CIFAR-10, and FFHQ-256 datasets, and demonstrate the effectiveness of this framework for realizing fast-mixing stable Markov chains in high dimensions.
INTRODUCTIONConsider a collection of i.i.d. samples {x i } n i=1 , assumed to have been drawn from an unknown distribution with density p X in R d . An important problem in probabilistic modeling is the task of drawing independent samples from p X , which has numerous potential applications. This problem is typically approached in two phases: approximating p X , and drawing samples from the approximated density. In unnormalized models the first phase is approached by learning the energy function f X associated with the Gibbs distribution p X ∝ exp(−f X ), and for the second phase one must resort to Markov chain Monte Carlo methods, such as Langevin MCMC, which are typically very slow to mix in high dimensions. MCMC sampling is considered an "art" and we do not have black box samplers that converge fast and are stable for complex (natural) distributions. The source of the problem is mainly attributed to the fact that the energy functions of interest are typically highly nonconvex.A broad sketch of our solution to this problem is to model a smoother density in an M-fold expanded space. The new density p(y), called M-density, is defined in R M d , where the bold y is a shorthand for (y 1 , . . . , y M ). M-density is smoother in the sense that its marginals p m (y m ) are obtained by the convolution p m (y m ) = p m (y m |x)p(x)dx with a smoothing kernel p m (y m |x) which for most of the paper we take to be the isotropic Gaussian:Although we bypass learning p(x), the new formalism allows for generating samples from p(x) since X can be estimated exactly given Y = y (for large M ). To give a physical picture, the approac...