Off-policy evaluation and learning (OPE/L) use offline observational data to make better decisions, which is crucial in applications where experimentation is necessarily limited. OPE/L is nonetheless sensitive to discrepancies between the data-generating environment and that where policies are deployed. Recent work proposed distributionally robust OPE/L (DROPE/L) to remedy this, but the proposal relies on inverse-propensity weighting, whose regret rates may deteriorate if propensities are estimated and whose variance is suboptimal even if not. For vanilla OPE/L, this is solved by doubly robust (DR) methods, but they do not naturally extend to the more complex DROPE/L, which involves a worst-case expectation. In this paper, we propose the first DR algorithms for DROPE/L with KL-divergence uncertainty sets. For evaluation, we propose Localized Doubly Robust DROPE (LDR 2 OPE) and prove its semiparametric efficiency under weak product rates conditions. Notably, thanks to a localization technique, LDR 2 OPE only requires fitting a small number of regressions, just like DR methods for vanilla OPE. For learning, we propose Continuum Doubly Robust DROPL (CDR 2 OPL) and show that, under a product rate condition involving a continuum of regressions, it enjoys a fast regret rate of O(N −1/2 ) even when unknown propensities are nonparametrically estimated. We further extend our results to general f -divergence uncertainty sets. We illustrate the advantage of our algorithms in simulations.