This work is motivated by the study of local protein structure, which is defined by two variable dihedral angles that take values from probability distributions on the flat torus. Our goal is to provide the space P(R 2 /Z 2 ) with a metric that quantifies local structural modifications due to changes in the protein sequence, and to define associated two-sample goodness-of-fit testing approaches. Due to its adaptability to the space geometry, we focus on the Wasserstein distance as a metric between distributions.We extend existing results of the theory of Optimal Transport to the d-dimensional flat torus T d = R d /Z d , in particular a Central Limit Theorem. Moreover, we assess different techniques for two-sample goodness-of-fit testing for the two-dimensional case, based on the Wasserstein distance. We provide an implentation of these approaches in R. Their performance is illustrated by numerical experiments on synthetic data and protein structure data.
Counterfactual frameworks have grown popular in explainable and fair machine learning, as they offer a natural notion of causation. However, state-of-the-art models to compute counterfactuals are either unrealistic or unfeasible. In particular, while Pearl's causal inference provides appealing rules to calculate counterfactuals, it relies on a model that is unknown and hard to discover in practice. We address the problem of designing realistic and feasible counterfactuals in the absence of a causal model. We define transport-based counterfactual models as collections of joint probability distributions between observable distributions, and show their connection to causal counterfactuals. More specifically, we argue that optimal transport theory defines relevant transport-based counterfactual models, as they are numerically feasible, statistically-faithful, and can even coincide with causal counterfactual models. We illustrate the practicality of these models by defining sharper fairness criteria than typical group fairness conditions.
We prove a Central Limit Theorem for the empirical optimal transport cost, nm n+m {Tc(Pn, Qm)− Tc(P, Q)}, in the semi discrete case, i.e when the distribution P is supported in N points, but without assumptions on Q. We show that the asymptotic distribution is the supremun of a centered Gaussian process, which is Gaussian under some additional conditions on the probability Q and on the cost. Such results imply the central limit theorem for the p-Wassertein distance, for p ≥ 1. This means that, for fixed N , the curse of dimensionality is avoided. To better understand the influence of such N , we provide bounds of E|W1(P, Qm) − W1(P, Q)| depending on m and N . Finally, the semidiscrete framework provides a control on the second derivative of the dual formulation, which yields the first central limit theorem for the optimal transport potentials. The results are supported by simulations that help to visualize the given limits and bounds. We analyse also the cases where classical bootstrap works.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.