Applications based on Machine Learning models have now become an indispensable part of the everyday life and the professional world. A critical question then recently arises among the population: Do algorithmic decisions convey any type of discrimination against specific groups of population or minorities? In this paper, we show the importance of understanding how a bias can be introduced into automatic decisions. We first present a mathematical framework for the fair learning problem, specifically in the binary classification setting. We then propose to quantify the presence of bias by using the standard Disparate Impact index on the real and well-known Adult income data set. Finally, we check the performance of different approaches aiming to reduce the bias in binary classification outcomes. Importantly, we show that some intuitive methods are ineffective with respect to the Statistical Parity criterion. This sheds light on the fact that trying to make fair machine learning models may be a particularly challenging task, in particular when the training observations contain some bias.
Wasserstein barycenters and variance-like criteria based on the Wasserstein distance are used in many problems to analyze the homogeneity of collections of distributions and structural relationships between the observations. We propose the estimation of the quantiles of the empirical process of Wasserstein's variation using a bootstrap procedure. We then use these results for statistical inference on a distribution registration model for general deformation functions.The tests are based on the variance of the distributions with respect to their Wasserstein's barycenters for which we prove central limit theorems, including bootstrap versions.In cases where G is a parametric class, estimation of the warping functions is studied in [2]. However, estimation/registration procedures may lead to inconsistent conclusions if the chosen deformation class G is too small. It is, therefore, important to be able to assess the fit to the deformation model given by a particular choice of G. This is the main goal of this paper. We note that within this framework, statistical inference on deformation models for distributions has been studied first in [20]. Here we provide a different approach which allows to deal with more general deformation classes.The pioneering works [15,25] study the existence of relationships between distributions F and G by using a discrepancy measure ∆(F, G) between them which is built using the Wasserstein distance. The authors consider the assumption H 0 : ∆(F, G) > ∆ 0 versus H a : ∆(F, G) ≤ ∆ 0 for a chosen threshold ∆ 0 . Thus when the null hypothesis is rejected, there is statistical evidence that the two distributions are similar with respect to the chosen criterion. In this same vein, we define a notion of variation of distributions using the Wasserstein distance, W r , in the set W r (R d ) of probability measures with finite rth moments, where r ≥ 1. This notion generalizes the concept of variance for random distributions over R d . This quantity can be defined aswhich measures the spread of the distributions. Then, to measure closeness to a deformation model, we take a look at the minimal variation among warped distributions, a quantity that we could consider as a minimal alignment cost. Under some mild conditions, a deformation model holds if and only if this minimal alignment cost is null and we can base our assessment of a deformation model on this quantity. As in [15,25], we provide results (a Central Limit Theorem and bootstrap versions) that enable to reject that the minimal alignment cost exceeds some threshold, and hence to conclude that it is below that threshold. Our results are given in a setup of general, nonparametric classes of warping functions. We also provide results in the somewhat more restrictive setup where one is interested in the more classical goodness-of-fit problem for the deformation model. Note that a general Central Limit Theorem is available for the Wasserstein distance in [18].The paper is organized as follows. The main facts about Wasserstein variation are present...
We provide a central limit theorem for the Monge–Kantorovich distance between two empirical distributions with sizes $n$ and $m$, $\mathcal{W}_p(P_n,Q_m), \ p\geqslant 1,$ for observations on the real line. In the case $p>1$ our assumptions are sharp in terms of moments and smoothness. We prove results dealing with the choice of centring constants. We provide a consistent estimate of the asymptotic variance, which enables to build two sample tests and confidence intervals to certify the similarity between two distributions. These are then used to assess a new criterion of data set fairness in classification.
The effectiveness of EMG biofeedback with neurorehabilitation robotic platforms has not been previously addressed. The present work evaluates the influence of an EMG-based visual biofeedback on the user performance when performing EMG-driven bilateral exercises with a robotic hand exoskeleton. Eighteen healthy subjects were asked to perform 1-min randomly generated sequences of hand gestures (rest, open and close) in four different conditions resulting from the combination of using or not (1) EMG-based visual biofeedback and (2) kinesthetic feedback from the exoskeleton movement. The user performance in each test was measured by computing similarity between the target gestures and the recognized user gestures using the L2 distance. Statistically significant differences in the subject performance were found in the type of provided feedback (p-value 0.0124). Pairwise comparisons showed that the L2 distance was statistically significantly lower when only EMG-based visual feedback was present (2.89 ± 0.71) than with the presence of the kinesthetic feedback alone (3.43 ± 0.75, p-value = 0.0412) or the combination of both (3.39 ± 0.70, p-value = 0.0497). Hence, EMG-based visual feedback enables subjects to increase their control over the movement of the robotic platform by assessing their muscle activation in real time. This type of feedback could benefit patients in learning more quickly how to activate robot functions, increasing their motivation towards rehabilitation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.