Spatial audio processes (SAPs) commonly encountered in consumer audio reproduction systems are known to generate a range of impairments to spatial quality. Two listening tests (involving two listening positions, six 5-channel audio recordings, and 48 SAPs) indicate that the degree of quality degradation is determined largely by the nature of the SAP but that the effect of a particular SAP can depend on program material and on listening position. Combining off-center listening with another SAP can reduce spatial quality significantly compared to auditioning that SAP centrally. These findings, and the associated listening test data, can guide the development of an artificial-listener-based spatial audio quality evaluation system.
Previously-obtained data, quantifying the degree of quality degradation resulting from a range of spatial audio processes (SAPs), can be used to build a regression model of perceived spatial audio quality in terms of previously developed spatially and timbrally relevant metrics. A generalizable model thus built, employing just five metrics and two principal components, performs well in its prediction of the quality of a range of program types degraded by a multitude of SAPs commonly encountered in consumer audio reproduction, auditioned at both central and off-center listening positions. Such a model can provide a correlation to listening test data of r = 0.89, with a root mean square error (RMSE) of 11%, making its performance comparable to that of previous audio quality models and making it a suitable core for an artificial-listener-based spatial audio quality evaluation system.
INTRODUCTIONA previous study [1] made the case for a new artificiallistener-based evaluation system capable of predicting the perceived quality degradations resulting from spatial audio processes (SAPs) commonly encountered in consumer audio multichannel loudspeaker reproduction systems (e.g., downmixing, multichannel coding, loudspeaker misplacement); it explained how such a system would be useful for quickly assessing overall spatial sound quality for research, product development, and quality control where assessment by a listening panel would be impractical or impossible. That study determined the degree of quality degradation resulting from a wide range of such SAPs and the influences of listening position and source material on that degradation. The research reported in the current paper will determine whether these findings can be used to build a regression model of perceived spatial audio quality, in terms of previously-developed metrics, that can form the core of the above-mentioned evaluation system.The intended system, named QESTRAL (Quality Evaluation of Spatial Transmission and Reproduction using an Artificial Listener) was proposed previously by Rumsey et al. [2] and, like PEAQ (Perceptual Evaluation of Audio Quality) [3], it will use an intrusive evaluation method to compare a reference version of the signal with one impaired by a SAP. Also like PEAQ, and the spatial hearing model developed by Mason [4], the QESTRAL system will employ specifically-synthesized audio probe signals, rather than analyzing real program material. These will be rendered via the SAP-degraded system and captured binaurally at the listening position, initially in a computer-simulated anechoic environment. Metrics will be applied to the captured signals and the results of these metrics will feed the regression
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.