Nonparametric estimation of the conditional expectation E(Y |U ) of an outcome Y given a covariate vector U is of primary importance in many statistical applications such as prediction and personalized medicine. In some problems, there is an additional auxiliary variable Z in the training dataset used to construct estimators, but Z is not available for future prediction or selecting patient treatment in personalized medicine. For example, in the training dataset longitudinal outcomes are observed, but only the last outcome Y is concerned in the future prediction or analysis. The longitudinal outcomes other than the last point is then the variable Z that is observed and related with both Y and U . Previous work on how to make use of Z in the estimation of E(Y |U ) mainly focused on using Z in the construction of a linear function of U to reduce covariate dimension for better estimation. Using E(Y |U ) = E{E(Y |U, Z)|U }, we propose a two-step estimation of inner and outer expectations, respectively, with sufficient dimension reduction for kernel estimation in both steps. The information from Z is utilized not only in dimension reduction, but also directly in the estimation. Because of the existence of different ways for dimension reduction, we construct two estimators that may improve the estimator without using Z. The improvements are shown in the convergence rate of estimators as the sample size increases to infinity as well as in the finite sample simulation performance. A real data analysis about the selection of mammography intervention is presented for illustration.In many statistical applications, a key step is to estimate the conditional expectation of Y givenwhere Y is a response of interest, U is a vector of covariates, and u 0 is a given specific value of U . Apparently, the prediction of a future Y at U = u 0 is an example. Another example is in the area of personalized medicine in which we would like to maximize the condition expectation E(Y |U = u 0 , a) over several treatment options a = 1, ..., k (Qian and Murphy, 2011), where u 0 is the vector of a future patient's prognostic factors and demographic variables, and Y is his or her future outcome. Larger (or smaller) Y means better outcome. Because parametric modeling of ψ(u 0 ) is difficult in many applications such as the personalized medicine problems, nonparametric kernel estimation of ψ(u 0 ) (Nadaraya, 1964;Watson, 1964) has been widely considered and used.As shown in Theorem 2.2.2 of Bierens (1987), the optimal convergence rate of a kernel estimator is n −m/(2m+p) , where m is the order of kernel and p is the dimension of U . When p is not small, it is crucial to search for a matrix B with the smallest possible column dimension d 0 < p such thatwhere B T is the transpose of B, and hence the optimal convergence rate is improved to n −m/(2m+d 0 ) . This is usually achieved by using the training data to estimate a B with smallest column dimension such that Y ⊥ ⊥ U | B T U , i.e., Y and U are independent conditional on B T U , which is referred t...