Artificial intelligence methods such as deep neural networks promise unprecedented capabilities in healthcare, from diagnosing diseases to prescribing treatments. While this can eventually produce a valuable suite of tools for automating clinical workflows, a critical step forward is to ensure that the predictive models are reliable and to enable a rigorous introspection of their behavior. This has led to the design of explainable AI techniques that are aimed at uncovering the relationships between discernible data signatures and decisions from machine-learned models, and characterizing strengths/weaknesses of models. In this context, the so-called counterfactual explanations that synthesize small, interpretable changes to a given query sample while producing desired changes in model predictions to support user-specified hypotheses (e.g., progressive change in disease severity) have become popular. When a model’s predictions are not well-calibrated (i.e., the prediction confidences are not indicative of the likelihood of the predictions being correct), the inverse problem of synthesizing counterfactuals can produce explanations with irrelevant feature manipulations. Hence, in this paper, we propose to leverage prediction uncertainties from the learned models to better guide this optimization. To this end, we present TraCE (Training Calibration-based Explainers), a counterfactual generation approach for deep models in medical image analysis, which utilizes pre-trained generative models and a novel uncertainty-based interval calibration strategy for synthesizing hypothesis-driven explanations. By leveraging uncertainty estimates in the optimization process, TraCE can consistently produce meaningful counterfactual evidences and elucidate complex decision boundaries learned by deep classifiers. Furthermore, we demonstrate the effectiveness of TraCE in revealing intricate relationships between different patient attributes and in detecting shortcuts, arising from unintended biases, in learned models. Given the widespread adoption of machine-learned solutions in radiology, our study focuses on deep models used for identifying anomalies in chest X-ray images. Using rigorous empirical studies, we demonstrate the superiority of TraCE explanations over several state-of-the-art baseline approaches, in terms of several widely adopted evaluation metrics in counterfactual reasoning. Our findings show that TraCE can be used to obtain a holistic understanding of deep models by enabling progressive exploration of decision boundaries, detecting shortcuts, and inferring relationships between patient attributes and disease severity.