Noise estimation is a crucial stage in speech enhancement (SE), and it commonly necessitates the use of prior models for speech, noise, or both. Prior models, on the other hand, can be ineffective in dealing with unseen nonstationary noise, especially at low signal to noise (SNR) levels. This paper proposes to assess the efficacy of an unsupervised SE approach based on weighted low rank and sparse matrix factorization to estimate noise and speech when neither is available beforehand by decomposing the input noisy spectrum into a low-rank noise component and a sparse speech component. Due to the approximation of the actual rank of noise, these techniques are constrained, and they do not directly exploit the low-rank property in optimization. Nuclear norm minimization (NNM) is the most well-known approach, as it can precisely recover the matrix's rank under certain restricted and theoretical guarantee conditions. NNM, on the other hand, is unable to reliably estimate the matrix rank in many situations. Significant advancements in computer vision and machine learning applications have demonstrated that a weighted nuclear norm minimization (WNNM), overcomes NNM shortcomings, and achieves a superior matrix rank approximation than NNM. Consequently, in this study, we present alternate SE algorithms that make use of weighted low rank and sparsity constraints to separate speech and noise spectrograms. Following that, they were trained and evaluated on a standard Automatic Speech Recognition (ASR) engine to lower the Word Error Rate (WER). Extensive investigations on the impact of real-world noise on speech signals show that the proposed model outperforms the existing state of art models in terms of objective measures like SDR, PESQ, SIG, BAK, OVL, and STOI values in varied noise circumstances under low SNR environments.
In speech communication applications such as teleconferences, mobile phones, etc., the real-time noises degrade the desired speech quality and intelligibility. For these applications, in the case of multichannel speech enhancement, the adaptive beamforming algorithms play a major role compared to fixed beamforming algorithms. Among the adaptive beamformers, Generalized Sidelobe Canceller (GSC) beamforming with Least Mean Square (LMS) Algorithm has the least complexity but provides poor noise reduction whereas GSC beamforming with Combined LMS (CLMS) algorithm has better noise reduction performance but with high computational complexity. In order to achieve a tradeoff between noise reduction and computational complexity in real-time noisy conditions, a Signed Convex Combination of Fast Convergence (SCCFC) algorithm based GSC beamforming for multi-channel speech enhancement is proposed. This proposed SCCFC algorithm is implemented using a signed convex combination of two Fast Convergence Normalized Least Mean Square (FCNLMS) adaptive filters with different step-sizes. This improves the overall performance of the GSC beamformer in real-time noisy conditions as well as reduces the computation complexity when compared to the existing GSC algorithms. The performance of the proposed multi-channel speech enhancement system is evaluated using the standard speech processing performance metrics. The simulation results demonstrate the superiority of the proposed GSC-SCCFC beamformer over the traditional methods.
The limited narrow band (LNB) speech signal spread in the range of 300 to 3400Hz used in public switched telephone networks results in poor-quality telephony speech. Bandwidth extension techniques are performed to expand the frequency range from LNB speech to a clear wideband (CWB) speech signal range of 50Hz-7000Hz over existing public telephone networks. In this paper, a novel robust speech bandwidth extension algorithm by Discrete Wavelet Transform- Discrete Cosine Transform- Based Data Hiding (DWT-DCT-DH) Hybrid transform model was used to spread the out-of-band (3400Hz to 7000Hz) speech frequencies over the LNB speech. In this proposed technique the out-of-band speech frequencies are embedded in LNB speech and imperceptibly spread over the network. These Embedded out-of-band speech frequencies are recovered steadily at the receiver end to generate a restored CWB telephony speech of considerably better quality. The proposed technique simulation results show more intelligible and better-quality telephony speech generated compared to the other bandwidth extension techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.