As a learning goal of deep neural networks, time-frequency masking has become research focus of supervised speech reconstruction. The previous study proves that the complex-value ratio mask (cRM) can simultaneously estimate the amplitude and phase components of the clean signal in the noisy speech. Compared with the other time-frequency masking features, the best performance can be achieved. However, because the imaginary structure is not obvious and the neural network learning is difficult, there is still no accurate estimation method at present. In this paper, we improved the speech reconstruction with the complex-valued full convolution neural network (CFCCN). Based on the theoretical of complex-value neural network, the complex-valued building blocks are designed for CFCCN to handle complex domain operations and estimate cRM. The building blocks include the complex convolution filter, complex activation functions, complex batch-norm, complex pooling and the weight initialization strategies. The experiment results show that in terms of subjective and objective measurements, this work achieves at least improvement of 1.3%-12.5% in contrast to the state-of-the-arts DNN based speech reconstruction methods in challenging conditions, where the environment noises are diverse, and the signals are non-stationary.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.