We refine the general methodology in [1] for the construction and analysis of essentially minimax estimators for a wide class of functionals of finite dimensional parameters, and elaborate on the case of discrete distributions with support size S comparable with the number of observations n. Specifically, we determine the "smooth" and "non-smooth" regimes based on the confidence set and the smoothness of the functional. In the "non-smooth" regime, we apply an unbiased estimator for a suitable polynomial approximation of the functional. In the "smooth" regime, we construct a general version of the bias-corrected Maximum Likelihood Estimator (MLE) based on Taylor expansion.We apply the general methodology to the problem of estimating the KL divergence between two discrete probability measures P and Q from empirical data in a non-asymptotic and possibly large alphabet setting. We construct minimax rate-optimal estimators for D(P Q) when the likelihood ratio is upper bounded by a constant which may depend on the support size, and show that the performance of the optimal estimator with n samples is essentially that of the MLE with n ln n samples. Our estimator is adaptive in the sense that it does not require the knowledge of the support size nor the upper bound on the likelihood ratio. We show that the general methodology results in minimax rate-optimal estimators for other divergences as well, such as the Hellinger distance and the χ 2 -divergence. Our approach refines the Approximation methodology recently developed for the construction of near minimax estimators of functionals of high-dimensional parameters, such as entropy, Rényi entropy, mutual information and 1 distance in large alphabet settings, and shows that the effective sample size enlargement phenomenon holds significantly more widely than previously established.