In this paper, we propose a novel speech dereverberation framework that utilizes deep neural network (DNN)-based spectrum estimation to construct linear inverse filters. The proposed dereverberation framework is based on the state-of-the-art inverse filter estimation algorithm called weighted prediction error (WPE) algorithm, which is known to effectively reduce reverberation and greatly boost the ASR performance in various conditions. In WPE, the accuracy of the inverse filter estimation, and thus the deverberation performance, is largely dependent on the estimation of the power spectral density (PSD) of the target signal. Therefore, the conventional WPE iteratively performs the inverse filter estimation, actual dereverberation and the PSD estimation to gradually improve the PSD estimate. However, while such iterative procedure works well when sufficiently long acoustically-stationary observed signals are available, WPE's performance degrades when the duration of observed/accessible data is short, which typically is the case for real-time applications using online block-batch processing with small batches. To solve this problem, we incorporate the DNN-based spectrum estimator into the framework of WPE, because a DNN can estimate the PSD robustly even from very short observed data. We experimentally show that the proposed framework outperforms the conventional WPE, and improves the ASR performance in real noisy reverberant environments in both single-channel and multichannel cases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.