Unsupervised Domain Adaptation (UDA) aims to leverage the knowledge from the labeled source domain to help the task of target domain with the unlabeled data. It is a key step for UDA to minimize the cross-domain distribution divergence. In this paper, we firstly propose a novel discrepancy metric, referred to as Cross Domain Mean Approximation (CDMA) discrepancy, to evaluate the distribution differences between source and target domains, which calculate the sum of the squares of the distances from the source and target domains to the mean of the other domain. Secondly, Joint Distribution Adaptation based on Cross Domain Mean Approximation (JDA-CDMA) is developed on the basis of CDMA to extract shared feature and simultaneously reduce the marginal and conditional distribution discrepancy between domains during the label refinement process. Thirdly, we construct a classifier utilizing CDMA metric and neighbor information. Finally, the proposed feature extraction approach and classifier are combined to realize transfer learning. Results from extensive experiments on five visual benchmarks including object, face, and digit images, show the proposed methods outperform the state-of-the-art unsupervised domain adaptation. I. INTRODUCTION I N machine vision, many machine learning methods, such as Linear Regression [1], Logistic Regression (LR) [2], k-Nearest Neighbor (k-NN) [3], Bayesian [4], Decision Tree [5], and Support Vector Machine (SVM)[6], are applied to image classification tasks. However, when the image feature representation is too redundant or poor in quality, their accuracy will be lowered. Therefore, it is of great importance to extract high-quality image feature. Feature extraction, as an important manner of mining image latent knowledge, is not only conducive to the in-depth understanding of image content, but also crucial to improve the accuracy of image classification and recognition [7]. Consequently it has attracted much attention from researchers. Principal Component Analysis (PCA) [8], Independent Component Analysis (ICA) [9], Linear Discriminant Analysis (LDA) [10], Maximum Margin Criterion (MMC) [11] and other algorithms are often used for feature extraction. In order to discover the nonlinear structure hidden in the high dimensional data and mine the local geometric structure information of data, Laplacian Eigen-maps (LE) [12], Locality Linear Embedding