Previously-obtained data, quantifying the degree of quality degradation resulting from a range of spatial audio processes (SAPs), can be used to build a regression model of perceived spatial audio quality in terms of previously developed spatially and timbrally relevant metrics. A generalizable model thus built, employing just five metrics and two principal components, performs well in its prediction of the quality of a range of program types degraded by a multitude of SAPs commonly encountered in consumer audio reproduction, auditioned at both central and off-center listening positions. Such a model can provide a correlation to listening test data of r = 0.89, with a root mean square error (RMSE) of 11%, making its performance comparable to that of previous audio quality models and making it a suitable core for an artificial-listener-based spatial audio quality evaluation system.
INTRODUCTIONA previous study [1] made the case for a new artificiallistener-based evaluation system capable of predicting the perceived quality degradations resulting from spatial audio processes (SAPs) commonly encountered in consumer audio multichannel loudspeaker reproduction systems (e.g., downmixing, multichannel coding, loudspeaker misplacement); it explained how such a system would be useful for quickly assessing overall spatial sound quality for research, product development, and quality control where assessment by a listening panel would be impractical or impossible. That study determined the degree of quality degradation resulting from a wide range of such SAPs and the influences of listening position and source material on that degradation. The research reported in the current paper will determine whether these findings can be used to build a regression model of perceived spatial audio quality, in terms of previously-developed metrics, that can form the core of the above-mentioned evaluation system.The intended system, named QESTRAL (Quality Evaluation of Spatial Transmission and Reproduction using an Artificial Listener) was proposed previously by Rumsey et al. [2] and, like PEAQ (Perceptual Evaluation of Audio Quality) [3], it will use an intrusive evaluation method to compare a reference version of the signal with one impaired by a SAP. Also like PEAQ, and the spatial hearing model developed by Mason [4], the QESTRAL system will employ specifically-synthesized audio probe signals, rather than analyzing real program material. These will be rendered via the SAP-degraded system and captured binaurally at the listening position, initially in a computer-simulated anechoic environment. Metrics will be applied to the captured signals and the results of these metrics will feed the regression