The Wasserstein distance between two probability measures on a metric space is a measure of closeness with applications in statistics, probability, and machine learning. In this work, we consider the fundamental question of how quickly the empirical measure obtained from n independent samples from µ approaches µ in the Wasserstein distance of any order. We prove sharp asymptotic and finite-sample results for this rate of convergence for general measures on general compact metric spaces. Our finite-sample results show the existence of multi-scale behavior, where measures can exhibit radically different rates of convergence as n grows. 1 AssumptionsWe are concerned with measures on a compact metric space X. The first assumption is entirely standard and allows us to avoid many measure-theoretic difficulties:Assumption 1. The metric space X is Polish, and all measures are Borel.Since we limit ourselves to the compact case, diam(X) is necessarily finite, and for normalization purposes we assume the following.Assumption 2. diam(X) ≤ 1.Assumption 2 can always be made to hold by a simple rescaling of the metric.
The growing role of data-driven approaches to scientific discovery has unveiled a large class of models that involve latent transformations with a rigid algebraic constraint. Three-dimensional molecule reconstruction in Cryo-Electron Microscopy (cryo-EM) is a central problem in this class. Despite decades of algorithmic and software development, there is still little theoretical understanding of the sample complexity of this problem, that is, number of images required for 3-D reconstruction. Here we consider multi-reference alignment (MRA), a simple model that captures fundamental aspects of the statistical and algorithmic challenges arising in cryo-EM and related problems. In MRA, an unknown signal is subject to two types of corruption: a latent cyclic shift and the more traditional additive white noise. The goal is to recover the signal at a certain precision from independent samples. While at high signal-to-noise ratio (SNR), the number of observations needed to recover a generic signal is proportional to 1/SNR, we prove that it rises to a surprising 1/SNR 3 in the low SNR regime. This precise phenomenon was observed empirically more than twenty years ago for cryo-EM but has remained unexplained to date. Furthermore, our techniques can easily be extended to the heterogeneous MRA model where the samples come from a mixture of signals, as is often the case in applications such as cryo-EM, where molecules may have different conformations. This provides a first step towards a statistical theory for heterogeneous cryo-EM.
Motivated by geometric problems in signal processing, computer vision, and structural biology, we study a class of orbit recovery problems where we observe very noisy copies of an unknown signal, each acted upon by a random element of some group (such as Z/p or SO(3)). The goal is to recover the orbit of the signal under the group action in the high-noise regime. This generalizes problems of interest such as multi-reference alignment (MRA) and the reconstruction problem in cryo-electron microscopy (cryo-EM). We obtain matching lower and upper bounds on the sample complexity of these problems in high generality, showing that the statistical difficulty is intricately determined by the invariant theory of the underlying symmetry group.In particular, we determine that for cryo-EM with noise variance σ 2 and uniform viewing directions, the number of samples required scales as σ 6 . We match this bound with a novel algorithm for ab initio reconstruction in cryo-EM, based on invariant features of degree at most 3. We further discuss how to recover multiple molecular structures from heterogeneous cryo-EM samples.
Isotonic regression is a standard problem in shape-constrained estimation where the goal is to estimate an unknown nondecreasing regression function f from independent pairs (While this problem is well understood both statistically and computationally, much less is known about its uncoupled counterpart where one is given only the unordered sets {x 1 , . . . , x n } and {y 1 , . . . , y n }. In this work, we leverage tools from optimal transport theory to derive minimax rates under weak moments conditions on y i and to give an efficient algorithm achieving optimal rates. Both upper and lower bounds employ moment-matching arguments that are also pertinent to learning mixtures of distributions and deconvolution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.