Separating target speech from noisy signal is important for many realistic applications. Recently, deep neural network (DNN) has been widely used in speech enhancement (SE) and obtained prominent performance improvements. However, the current deep models require a large amount of training data to obtain a good performance. It is still challenging to construct an effective deep speech enhancement model with actual few training samples. At present, meta-learning has become the research focus of few-shot learning due to its capability of quickly process new tasks with few samples by the prior meta-knowledge, but there are very few works applying meta-learning on few-shot speech enhancement. In this paper, we propose a generic meta-learning framework Meta-SE which applies the U-Net as the metalearner, to tackle the few-shot speech enhancement problem. Meta-SE is trained and optimized with the changed speech enhancement tasks to obtain meta-knowledge, and towards better capability of fast and good generalizing to the new unseen noises with few training samples. The experiment results show that the proposed method not only outperforms the state-of-the-arts DNN-SE models under the few-shot conditions, but also learns a more general and flexible model for task adaption. INDEX TERMS Speech enhancement, single-channel, meta-learning, few-shot learning This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.