Abstract-Reverberation continues to present a major problem for sound source separation algorithms. However, humans demonstrate a remarkable robustness to reverberation and many psychophysical and perceptual mechanisms are well documented. The precedence effect is one of these mechanisms; it aids our ability to localise sounds in reverberation. Despite this, relatively little work has been done on incorporating the precedence effect into automated source separation. Furthermore, no work has been carried out on adapting a precedence model to the acoustic conditions under test and it is unclear whether such adaptation, analogous to the perceptual Clifton effect, is even necessary. Hence, this study tests a previously proposed binaural separation/precedence model in real rooms with a range of reverberant conditions. The precedence model inhibitory time constant and inhibitory gain are varied in each room in order to establish the necessity for adaptation to the acoustic conditions. The study concludes that adaptation is necessary and can yield significant gains in separation performance. Furthermore, it is shown that the Initial Time Delay Gap and the Direct-to-Reverberant Ratio are important factors when considering this adaptation.
The ideal binary mask (IBM) is widely considered to be the benchmark for time-frequency-based sound source separation techniques such as computational auditory scene analysis (CASA). However it is well known that binary masking introduces objectionable distortion, especially musical noise. This can make binary masking unsuitable for sound source separation applications where the output is auditioned. It has been suggested that soft masking reduces musical noise and leads to a higher quality output. A previously defined soft mask, the ideal ratio mask (IRM), is found to have similar properties to the IBM, may correspond more closely to auditory processes, and offers additional computational advantages. Consequently the IRM is proposed as the goal of CASA. To further support this position, a number of studies are reviewed that show soft masks to provide superior performance to the IBM in applications such as automatic speech recognition and speech intelligibility. A brief empirical study provides additional evidence demonstrating the objective and perceptual superiority of the IRM over the IBM.
Abstract-A number of metrics has been proposed in the literature to assess sound source separation algorithms. The addition of convolutional distortion raises further questions about the assessment of source separation algorithms in reverberant conditions as reverberation is shown to undermine the optimality of the ideal binary mask (IBM) in terms of signal-to-noise ratio (SNR). Furthermore, with a range of mixture parameters common across numerous acoustic conditions, SNR-based metrics demonstrate an inconsistency that can only be attributed to the convolutional distortion. This suggests the necessity for an alternate metric in the presence of convolutional distortion, such as reverberation. Consequently, a novel metric-dubbed the IBM ratio (IBMR)-is proposed for assessing source separation algorithms that aim to calculate the IBM. The metric is robust to many of the effects of convolutional distortion on the output of the system and may provide a more representative insight into the performance of a given algorithm.
Abstract. Source separation evaluation is typically a top-down process, starting with perceptual measures which capture fitness-for-purpose and followed by attempts to find physical (objective) measures that are predictive of the perceptual measures. In this paper, we take a contrasting bottom-up approach. We begin with the physical measures provided by the Blind Source Separation Evaluation Toolkit (BSS Eval) and we then look for corresponding perceptual correlates. This approach is known as psychophysics and has the distinct advantage of leading to interpretable, psychophysical models. We obtained perceptual similarity judgments from listeners in two experiments featuring vocal sources within musical mixtures. In the first experiment, listeners compared the overall quality of vocal signals estimated from musical mixtures using a range of competing source separation methods. In a loudness experiment, listeners compared the loudness balance of the competing musical accompaniment and vocal. Our preliminary results provide provisional validation of the psychophysical approach.
Auditory interference scenarios, where a listener wishes to attend to some target audio while being presented with interfering audio, are prevalent in daily life. The goal of developing an accurate computational model which can predict masking thresholds for such scenarios is still incomplete. While some sophisticated, physiologically inspired, masking prediction models exist, they are rarely tested with ecologically valid programmes (such as music and speech). In order to test the accuracy of model predictions human listener data is required. To that end a masking threshold experiment was conducted for a variety of target and interferer programmes. The results were analysed alongside predictions made by the computational auditory signal processing and prediction model (CASP) described by Jepsen et al. (2008). Masking thresholds were predicted to within 3.6 dB root mean squared error with the greatest prediction inaccuracies occurring in the presence of speech. These results are comparable to those of the model by Glasberg and Moore (2005) for predicting the audibility of time-varying sounds in the presence of background sounds, which otherwise represent the most accurate predictions of this type in the literature.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.