Abstract-We propose a monaural intrusive instrumental intelligibility metric called SIIB (speech intelligibility in bits). SIIB is an estimate of the amount of information shared between a talker and a listener in bits per second. Unlike existing information theoretic intelligibility metrics, SIIB accounts for talker variability and statistical dependencies between timefrequency units. Our evaluation shows that relative to state-ofthe-art intelligibility metrics, SIIB is highly correlated with the intelligibility of speech that has been degraded by noise and processed by speech enhancement algorithms.
Instrumental intelligibility metrics are commonly used as an alternative to listening tests. This paper evaluates 12 monaural intrusive intelligibility metrics: SII, HEGP, CSII, HASPI, NCM, QSTI, STOI, ESTOI, MIKNN, SIMI, SIIB, and sEPSM corr . In addition, this paper investigates the ability of intelligibility metrics to generalize to new types of distortions and analyzes why the top performing metrics have high performance. The intelligibility data were obtained from 11 listening tests described in the literature. The stimuli included Dutch, Danish, and English speech that was distorted by additive noise, reverberation, competing talkers, pre-processing enhancement, and post-processing enhancement. SIIB and HASPI had the highest performance achieving a correlation with listening test scores on average of ρ = 0.92 and ρ = 0.89, respectively. The high performance of SIIB may, in part, be the result of SIIBs developers having access to all the intelligibility data considered in the evaluation. The results show that intelligibility metrics tend to perform poorly on data sets that were not used during their development. By modifying the original implementations of SIIB and STOI, the advantage of reducing statistical dependencies between input features is demonstrated. Additionally, the paper presents a new version of SIIB called SIIB Gauss , which has similar performance to SIIB and HASPI, but takes less time to compute by two orders of magnitude.
Instrumental measures of speech intelligibility typically produce an index between 0 and 1 that is monotonically related to listening test scores. As such, these measures are dimensionless and do not represent physical quantities. In this paper, we propose a new instrumental intelligibility metric that describes speech intelligibility using bits per second. The proposed metric builds upon an existing intelligibility metric that was motivated by information theory. Our main contribution is that we use a statistical model of speech communication that accounts for noise inherent in the speech production process. Experiments show that the proposed metric performs at least as well as existing state-of-the-art intelligibility metrics.
Information bottleneck (IB) is a method for extracting information from one random variable X that is relevant for predicting another random variable Y . To do so, IB identifies an intermediate "bottleneck" variable T that has low mutual information I(X; T ) and high mutual information I(Y ; T ). The IB curve characterizes the set of bottleneck variables that achieve maximal I(Y ; T ) for a given I(X; T ), and is typically explored by maximizing the IB Lagrangian, I(Y ; T ) − βI(X; T ). In some cases, Y is a deterministic function of X, including many classification problems in supervised learning where the output class Y is a deterministic function of the input X. We demonstrate three caveats when using IB in any situation where Y is a deterministic function of X: (1) the IB curve cannot be recovered by maximizing the IB Lagrangian for different values of β; (2) there are "uninteresting" trivial solutions at all points of the IB curve; and (3) for multi-layer classifiers that achieve low prediction error, different layers cannot exhibit a strict trade-off between compression and prediction, contrary to a recent proposal. We also show that when Y is a small perturbation away from being a deterministic function of X, these three caveats arise in an approximate way. To address problem (1), we propose a functional that, unlike the IB Lagrangian, can recover the IB curve in all cases. We demonstrate the three caveats on the MNIST dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.