Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in two-sample tests, which are used for determining whether two sets of observations arise from the same distribution, covariate shift correction, local learning, measures of independence, and density estimation.Kernel methods are widely used in supervised learning [1,2,3,4], however they are much less established in the areas of testing, estimation, and analysis of probability distributions, where information theoretic approaches [5,6] have long been dominant. Recent examples include [7] in the context of construction of graphical models, [8] in the context of feature extraction, and [9] in the context of independent component analysis. These methods have by and large a common issue: to compute quantities such as the mutual information, entropy, or Kullback-Leibler divergence, we require sophisticated space partitioning and/or bias correction strategies [10,9].In this paper we give an overview of methods which are able to compute distances between distributions without the need for intermediate density estimation. Moreover, these techniques allow algorithm designers to specify which properties of a distribution are most relevant to their problems. We are optimistic that our embedding approach to distribution representation and analysis will lead to the development of algorithms which are simpler and more effective than entropy-based methods in a broad range of applications.We begin our presentation in Section 1 with an overview of reproducing kernel Hilbert spaces (RKHSs), and a description of how probability distributions can be represented as elements in an RKHS. In Section 2, we show how these representations may be used to address a variety of problems, including homogeneity testing (Section 2.1), covariate shift correction (Section 2.2), independence measurement (Section 2.3), feature extraction (Section 2.4), and density estimation (Section 2.5).