Objective. By means of subjective psychophysical methods, quality of transmitted speech has been decomposed into three perceptual dimensions named 'discontinuity' (F), 'noisiness' (N) and 'coloration' (C). Previous studies using electroencephalography (EEG) already reported effects of perceived intensity of single quality dimensions on electrical brain activity. However, it has not been investigated so far, whether the dimensions themselves are dissociable on a neurophysiological level of analysis. Approach. Pursuing this goal in the present study, a high-quality (HQ) recording of a spoken word was degraded on each dimension at a time, resulting in three quality-impaired stimuli (F, N, C) which were on average described as being equal in perceived degradation intensity. Participants performed a three-stimulus oddball task, involving the serial presentation of different stimulus types:(1) HQ or degraded 'standard' stimuli to establish sensory/perceptual quality references. (2) Degraded 'oddball' stimuli to cause random, infrequent deviations from those references. EEG was employed to examine the neuro-electrical correlates of speech quality perception. Main results. Emphasis was placed on modulations in temporal and morphological characteristics of the P300 component of the event-related brain potential (ERP), whose subcomponents P3a and P3b are commonly linked to attentional orienting and task relevance categorization, respectively. Electrophysiological data analysis ( N = 28) revealed significant modulations of P300 amplitude and latency by the perceptual dimensions underlying both quality references and oddball stimuli. Significance. The present study exemplifies the utility of physiological methods like EEG for dissociating speech degradations not only based on perceived intensity level, but also their distinctive quality dimension.
Quality of transmitted speech can be decomposed into three perceptual dimensions: Noisiness, coloration and discontinuity (Wältermann, Dimension-based quality modeling of transmitted speech. Springer, Berlin. doi:10.1007Berlin. doi:10. /978-3-642-35019-1, 2013. The purpose of the present study was to explore whether degradation of speech quality on each perceptual dimension affected the morphological and temporal characteristics of the P300 eventrelated brain potential (ERP) component. The P300 is composed of two subcomponents, P3a and P3b, which served as neurophysiological indicators of distinct processes in human quality perception (Polich, Neuropsychology of P300. Oxford University Press, Oxford. doi:10. 1093/oxfordhb/9780195374148.013.0089, 2012; Raake and Egger, Quality and quality of experience. Springer International Publishing, Cham, pp. 11-33, 2014): While the earlier P3a reflects attentional processing after the occurrence of novel sensory events, the later P3b is associated with memory operations following the detection of task-relevant stimuli. Electroencephalography (EEG) was used to record the electrical brain activity of subjects (N ¼ 24) performing a three-stimulus oddball task. Degraded stimuli were generated from the audio recording of a spoken word. The analysis of P3a-and P3b-related activity at electrode positions Fz, Cz and Pz provided support for the existence of different perceptual references for quality-impaired vs. high-quality stimulus contexts as well as quality degradations on single perceptual dimensions.
Objective. Non-invasive physiological methods like electroencephalography (EEG) are increasingly employed to assess human information processing during exposure to multimedia signals. In the quality engineering field, previous research has promoted the utility of the P300 event-related brain potential (ERP) component for indicating variation in quality perception. The present study provides a starting point to test whether the P300 and its two subcomponents, P3a and P3b, are truly reflective of changes in the perceived quality of transmitted speech signals given the presence of other, quality-unrelated changes in acoustic stimulation. Approach. High-quality and degraded variants of spoken words were presented in a two-feature oddball task, which required participants to actively respond to rarely occurring ‘target’ stimuli within a series of frequent ‘standard’ stimuli, thereby eliciting ERP waveforms. Target presentations involved either single quality changes or concurrent double changes in quality and the initial phoneme. Main results. In case additional phonological change was present, only varying quality of standard stimuli caused significant modulations in P3a and P3b characteristics (N = 32). Thus, the formation of different short-term quality references exerted a persisting influence on the auditory processing of transmitted speech. Significance. The obtained results elucidate the importance of contextual and content-related influencing factors for proving the validity of the P300 as a psychophysiological indicator of speech quality change. Associated questions regarding the transfer of ERP-based quality assessment into more practically relevant measurement contexts are discussed.
This study introduces a Quality of Experience (QoE) model of loudspeaker-based speech reproduction, which specifies quality elements and quality features relevant to Overall Listening Experience (OLE) and Quality of Service (QoS), respectively. Assumptions about the relations between selected quality elements and quality features were validated in a listeningonly test. Participants had the task to behaviorally identify the voices of two different talkers. The talkers took turns in uttering sentences through only a central loudspeaker (non-spatial mode) versus through either the central or one talker-specific lateral loudspeaker (spatial mode). The quality of the transmitted speech signals was either clean, superimposed with background noise or bandpass-filtered. It was demonstrated that transmission quality, but not reproduction mode significantly influenced evaluative (speech quality, speech intelligibility) and immersive (voice naturalness, spatial presence, social presence) aspects of listening experience. Unexpectedly, the spatial mode did not reduce the mental effort of talker identification, as opposed to prior evidence. The results suggest that noticeable advantages of spatial hearing in speech reproduction only manifest in listening situations of higher complexity. Moreover, the employed subjective measures (category rating scales) might not have been sensitive enough to capture more subtle variation in behavioral task performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.