Auditory verbal hallucinations (AVH) – or hearing voices – occur in clinical and non-clinical populations, but their mechanisms remain unclear. Predictive processing models of psychosis have proposed that hallucinations arise from an over-weighting of prior expectations in perception. It is unknown, however, whether this reflects i) a sensitivity to explicit modulation of prior knowledge, or ii) a pre-existing tendency to spontaneously use such knowledge more in ambiguous contexts. Four experiments were conducted to examine this question in healthy participants listening to ambiguous speech stimuli. In experiments 1 (n = 60) and 2 (n = 60), participants discriminated intelligible and unintelligible sine-wave speech (SWS) before and after exposure to the original language templates (i.e., a modulation of expectation). No relationship was observed between top-down modulation and two common measures of hallucination-proneness. Experiment 3 (n = 99) confirmed this pattern with a different stimulus – sine-vocoded speech (SVS) – that was designed to minimise ceiling effects in discrimination and more closely model previous top-down effects reported in psychosis. In Experiment 4 (n = 135), participants were exposed to SVS without prior knowledge that it contained speech (i.e., naïve listening). AVH-proneness significantly predicted spontaneous pre-exposure identification of speech, but was unrelated to performance on a subsequent discrimination task, post-exposure. Altogether, these findings support a pre-existing tendency to spontaneously draw upon prior knowledge in healthy people prone to AVH, rather than a sensitivity to temporary modulations of expectation. We propose a model of clinical and non-clinical hallucinations, across auditory and visual modalities, with testable predictions for future research.