Two competing accounts propose that the disruption of short-term memory by irrelevant speech arises either due to interference-by-process (e.g., changing-state effect) or attentional capture, but it is unclear how whispering affects the irrelevant speech effect. According to the interference-by-process account, whispered speech should be less disruptive due to its reduced periodic spectro-temporal fine structure and lower amplitude modulations. In contrast, the attentional account predicts more disruption by whispered speech, possibly via enhanced listening effort in the case of a comprehended language. In two experiments, voiced and whispered speech (spoken sentences or monosyllabic words) were presented while participants memorized the order of visually presented letters. In both experiments, a changing-state effect was observed regardless of the phonation (sentences produced more disruption than “steady-state” words). Moreover, whispered speech (lower fluctuation strength) was more disruptive than voiced speech when participants understood the language (Experiment 1), but not when the language was incomprehensible (Experiment 2). The results suggest two functionally distinct mechanisms of auditory distraction: While changing-state speech causes automatic interference with seriation processes regardless of its meaning or intelligibility, whispering appears to contain cues that divert attention from the focal task primarily when presented in a comprehended language, possibly via enhanced listening effort.