The presence of environmental additive noise in the vicinity of the user typically degrades the speech intelligibility of speech processing applications. This intelligibility loss can be compensated by properly preprocessing the speech signal prior to playout, often referred to as near-end speech enhancement. Although the majority of such algorithms focus primarily on the presence of additive noise, reverberation can also severely degrade intelligibility. In this paper we investigate how late reverberation and additive noise can be jointly taken into account in the near-end speech enhancement process. For this effort we use a recently presented approximation of the speech intelligibility index under a power constraint, which we optimize for speech degraded by both additive noise and late reverberation. The algorithm results in time-frequency dependent amplification factors that depend on both the additive noise power spectral density as well as the late reverberation energy. These amplification factors redistribute speech energy across frequency and perform a dynamic range compression. Experimental results using both instrumental intelligibility measures as well as intelligibility listening tests show that the proposed approach improves speech intelligibility over state-of-the-art reference methods when speech signals are degraded simultaneously by additive noise and reverberation. Speech intelligibility improvements in the order of 20% are observed.Index Terms-Additive noise, approximated speech intelligibility index (SII), late reverberation, speech intelligibility.
Abstract-In this article, we address speech reinforcement (near-end listening enhancement) for a scenario where there are several playback zones. In such a framework, signals from one zone can leak into other zones (crosstalk), causing intelligibility and/or quality degradation. An optimization framework is built by exploring a signal model where effects of noise, reverberation and zone crosstalk are taken into account simultaneously. Through the symbolic usage of a general smooth distortion measure, necessary optimality conditions are derived in terms of distortion measure gradients and the signal model. Subsequently, as an illustrative example of the framework, the conditions are applied for the mean-square error (MSE) expected distortion under a hybrid stochastic-deterministic model for the corruptions. A crosstalk cancellation algorithm follows, which depends on diffuse reverberation and across zone direct path components. Simulations validate the optimality of the algorithm and show a clear benefit in multizone processing, as opposed to the iterated application of a single-zone algorithm. Also, comparisons with least-squares crosstalk cancellers in literature show the profit of using a hybrid model.Index Terms-Near-end listening enhancement, speech reinforcement, multizone, public address system.
In this paper, a time-frequency weighting is proposed for speech reinforcement (near-end listening enhancement) in a noisy and reverberant environment, which optimizes a perceptual distortion measure locally for each time-frequency bin. The algorithm acts as a dynamic range compressor, smearing out the energy of the clean speech along time.Simulations predict an intelligibility increase with respect to the unprocessed condition and two reference methods, for moderate smoothing windows, as measured by the optimized distortion measure and two objective intelligibility measures.
Modern communication technology facilitates communication from-anywhere to-anywhere. As a result, low speech intelligibility has become a common problem, which is exacerbated by the lack of feedback to the talker about the rendering environment. In recent years, a range of algorithms has been developed to enhance the intelligibility of speech rendered in a noisy environment. We describe methods for intelligibility enhancement from a unified vantage point. Before one defines a measure of intelligibility, the level of abstraction of the representation must be selected. For example, intelligibility can be measured on the message, on the sequence of words spoken, on the sequence of sounds, or on a sequence of states of the auditory system. Natural measures of intelligibility defined at the message level are mutual information and the hit-or-miss criterion. The direct evaluation of high-level measures requires quantitative knowledge of human cognitive processing. Lower level measures can be derived from higher level measures by making restrictive assumptions. We discuss the implementation and performance of some specific enhancement systems in detail, including speech intelligibility index (SII) based systems and systems aimed at enhancing the sound-field where it is perceived by the listener. We conclude with a discussion of the current state of the field and open problems.
In this article, we address near-end speech enhancement for a scenario where there are several playback zones. A signal model is explored, where effects of noise, reverberation and zone crosstalk are taken into account simultaneously. Through the symbolic usage of a general smooth distortion measure, necessary optimality conditions are derived. The conditions are applied to a DFT magnitudebased distortion measure and an algorithm follows, which applies per-zone spectral subtraction followed by channel inversion. Simulations validate the optimality of the algorithm and show a clear benefit in multizone processing, as opposed to the iterated application of a single-zone algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.