Deep learning (DL) has emerged as a promising tool to downscale climate projections at regional‐to‐local scales from large‐scale atmospheric fields following the perfect‐prognosis approach. Given their complexity, it is crucial to properly evaluate these methods, especially when applied to changing climatic conditions where the ability to extrapolate/generalize is key. In this work, we intercompare several DL models extracted from the literature for the same challenging use‐case (downscaling temperature in the CORDEX North America domain) and expand standard evaluation methods building on eXplainable Artificial Intelligence (XAI) techniques. Specifically, we introduce two novel XAI‐based diagnostics—Aggregated Saliency Map and Saliency Dispersion Maps—and show how they can be used to unravel the internal behavior of these models, aiding in their design and evaluation. This work advocates for the introduction of XAI techniques into deep downscaling evaluation frameworks, especially when working with large regions and/or under climate change conditions.