This paper proposes a new nested algorithm (NPL) for the estimation of a class of discrete Markov decision models and studies its statistical and computational properties. Our method is based on a representation of the solution of the dynamic programming problem in the space of conditional choice probabilities. When the NPL algorithm is initialized with consistent nonparametric estimates of conditional choice probabilities, successive iterations return a sequence of estimators of the structural parameters which we call K-stage policy iteration estimators. We show that the sequence includes as extreme cases a Hotz-Miller estimator (for K = 1) and Rust's nested fixed point estimator (in the limit when K → ). Furthermore, the asymptotic distribution of all the estimators in the sequence is the same and equal to that of the maximum likelihood estimator. We illustrate the performance of our method with several examples based on Rust's bus replacement model. Monte Carlo experiments reveal a trade-off between finite sample precision and computational cost in the sequence of policy iteration estimators.