This study introduces a Quality of Experience (QoE) model of loudspeaker-based speech reproduction, which specifies quality elements and quality features relevant to Overall Listening Experience (OLE) and Quality of Service (QoS), respectively. Assumptions about the relations between selected quality elements and quality features were validated in a listeningonly test. Participants had the task to behaviorally identify the voices of two different talkers. The talkers took turns in uttering sentences through only a central loudspeaker (non-spatial mode) versus through either the central or one talker-specific lateral loudspeaker (spatial mode). The quality of the transmitted speech signals was either clean, superimposed with background noise or bandpass-filtered. It was demonstrated that transmission quality, but not reproduction mode significantly influenced evaluative (speech quality, speech intelligibility) and immersive (voice naturalness, spatial presence, social presence) aspects of listening experience. Unexpectedly, the spatial mode did not reduce the mental effort of talker identification, as opposed to prior evidence. The results suggest that noticeable advantages of spatial hearing in speech reproduction only manifest in listening situations of higher complexity. Moreover, the employed subjective measures (category rating scales) might not have been sensitive enough to capture more subtle variation in behavioral task performance.