The aim of the paper is to estimate the density functions or distribution functions measured by Wasserstein metric, a typical kind of statistical distances, which is usually required in the statistical learning. Based on the classical Bernstein approximation, a scheme is presented.To get the error estimates of the scheme, the problem turns to estimating the L1 norm of the Bernstein approximation for monotone C −1 functions, which was rarely discussed in the classical approximation theory. Finally, we get a probability estimate by the statistical distance. §1 IntroductionIn the statistical learning such as pattern recognition and classification, one of the key problems is to find the conditional probability distribution or conditional expectation, which is based on the distributions estimated from sampling data. There are a lot of schemes to estimate the density functions and the corresponding distribution functions(e.g. [3,5,6]). However, most of them are based on the traditional mathematical norms such as L p (usually mean square error), Sobolev and Besov, which require the continuity of the approximated function (the density or distribution functions). However, most density or distribution functions do not possess such a property or are not in the space of L p , Sobolev, Besov and so on. Therefore a statistical distance, called Wasserstein metric [1], is introduced. Intuitively, if each density function is viewed as a unit amount of "earth", the distance between two density functions is the minimum cost of the energy of turning one pile into the other. Because of this analogy, the metric is known in computer science as the earth mover's distance. The following two examples are presented to show why we prefer Wasserstein metric in our study.