Objective. Deep learning models that aid in medical image assessment tasks must be both accurate and reliable to be deployed within clinical settings. While deep learning models have been shown to be highly accurate across a variety of tasks, measures that indicate the reliability of these models are less established. Increasingly, uncertainty quantification (UQ) methods are being introduced to inform users on the reliability of model outputs. However, most existing methods cannot be augmented to previously validated models because they are not post hoc, and they change a model’s output. In this work, we overcome these limitations by introducing a novel post hoc UQ method, termed Local Gradients UQ, and demonstrate its utility for deep learning-based metastatic disease delineation. Approach. This method leverages a trained model’s localized gradient space to assess sensitivities to trained model parameters. We compared the Local Gradients UQ method to non-gradient measures defined using model probability outputs. The performance of each uncertainty measure was assessed in four clinically relevant experiments: (1) response to artificially degraded image quality, (2) comparison between matched high- and low-quality clinical images, (3) false positive (FP) filtering, and (4) correspondence with physician-rated disease likelihood. Main results. (1) Response to artificially degraded image quality was enhanced by the Local Gradients UQ method, where the median percent difference between matching lesions in non-degraded and most degraded images was consistently higher for the Local Gradients uncertainty measure than the non-gradient uncertainty measures (e.g., 62.35% vs. 2.16% for additive Gaussian noise). (2) The Local Gradients UQ measure responded better to high- and low-quality clinical images (p<0.05 vs p>0.1 for both non-gradient uncertainty measures). (3) FP filtering performance was enhanced by the Local Gradients UQ method when compared to the non-gradient methods, increasing the area under the receiver operating characteristic curve (ROC AUC) by 20.1% and decreasing the false positive rate by 26%. (4) The Local Gradients UQ method also showed more favorable correspondence with physician-rated likelihood for malignant lesions by increasing ROC AUC for correspondence with physician-rated disease likelihood by 16.2%. Significance. In summary, this work introduces and validates a novel gradient-based UQ method for deep learning-based medical image assessments to enhance user trust when using deployed clinical models.