As forest fire activity increases worldwide, it is important to track changing patterns of burn severity (i.e., degree of fire‐caused ecological change). Satellite data provide critical information across space and time, yet how satellite indices relate to individual measures of burn severity on the ground (e.g., tree mortality or surface charring) and how these relationships change across biophysical gradients remain unclear. To address these knowledge gaps, we used Bayesian hierarchical zero‐one‐inflated beta (ZOIB) regression models with nearly 600 plots of individual field measures of burn severity distributed across the U.S. Rocky Mountains. We asked the following: How do three commonly used satellite indices of burn severity relate to individual field measures of canopy burn severity and forest‐floor burn severity (Q1)? Then, using the highest ranked satellite index, how is reliability affected by biophysical gradients that can be captured in accessible geospatial data (e.g., latitude, slope) (Q2) and stand‐structure data typically available only with field data (Q3)? The Relative differenced Normalized Burn Ratio (RdNBR) outperformed the differenced Normalized Burn Ratio (dNBR) and the Relative Burn Ratio (RBR) across canopy and forest‐floor measures of burn severity, but differences among index performances were minor. Overall, indices performed better for field measures of canopy burn severity than for forest‐floor measures. The relationship between RdNBR and individual field measures of burn severity changed across several biophysical gradients. For example, the same value of RdNBR corresponded to different field levels of burn severity depending on latitude, pre‐fire forest structure, and pre‐fire beetle outbreaks—and effects of biophysical gradients were often different for canopy vs. forest‐floor measures of burn severity. We show that estimating field measures of burn severity using satellite indices can be improved by including biophysical information, but if variables that are difficult to obtain without field data (e.g., pre‐fire beetle outbreak severity) are lacking, we suggest caution in interpreting satellite indices of burn severity across gradients of pre‐fire biophysical conditions. Finally, using an example fire, we illustrate contrasting maps of burn severity that arise from differences in the relationship between individual field measures of burn severity and RdNBR after accounting for error in those relationships.