A defining aspect of tonality in Western music is that different musical pitches are perceived to have different stabilities: listeners expect unstable pitches to resolve to stable pitches such as the tonic, which is the most stable. To investigate possible explanations for these hierarchies of tonal stability, we conducted three experiments where participants rated the ‘fit’ and ‘stability’ of probe tones contextualized by a variety of musical scales including familiar and unfamiliar scales in 12-tone equal temperament, and the ‘stability’ of probe tones contextualized by unfamiliar scales in 22-tone equal temperament. Context scale pitches were presented in random order to minimize tonal cues beyond scale structure. Using Bayesian multilevel regression, we modelled the ratings with an acoustical feature (spectral pitch class similarity) and a culture-dependent feature (scale-degree prevalence in a culturally appropriate corpus), along with several covariates. Across all scales, spectral pitch class similarity is strongly predictive of the responses and, for the familiar scales where corpus data are obtainable, prevalence makes an additional independent contribution. Furthermore, spectral pitch class similarity is a better predictor of stability than is a simple binary indicator of whether the probe’s pitch is in the context. These findings show that, for Western enculturated listeners, spectral pitch class similarity approximates the perceived stability of non-simultaneous pitches analogously to how spectral features, such as roughness and harmonicity, approximate the perceived stability of simultaneous pitches.