This paper presents a novel dictionary learning (DL) method to improve the performance of sparsity based single-channel speech separation (SCSS). The conventional approaches regard the sub-dictionaries as independent units and learn sub-dictionaries separately in the short-time Fourier transform (STFT) domain using their corresponding training sets respectively. However, we take the relationship between the sub-dictionaries into account and optimize the sub-dictionaries jointly in the time domain. By satisfying a designed discrimination constraint, a structured dictionary, whose atoms have better correspondences to the speaker labels, is learned so that the sources can be recovered by the corresponding reconstruction after sparse coding. An algorithm, which consists of sparse coding stage and dictionary updating stage, is proposed to deal with this DL optimization problem. Two strategies, i.e., direct learning and adaptive learning, are presented to select the training sets which are used to learn the discriminative dictionary. Experimental results show that the proposed SCSS algorithms have superior performance compared with other tested approaches.
1This paper presents a novel algorithm for learning a hierarchical dictionary in the short-time Fourier (STFT) domain, which can improve the performance of dictionary learning (DL) based single-channel speech separation (SCSS). The goal of SCSS is to separate the underlying clean speeches from a signal mixture, which was often achieved by learning a pair of discriminative subdictionaries and sparsely coding the mixture speech signal over the dictionary pair. The case of 2 source speech signals is considered in this paper. Unfortunately, the existing DL approaches cannot avoid the source confusion drastically, i.e., when we sparsely represent the mixture signal over the dictionary pair, parts of the object speech component are explained by interferer speech dictionary atoms and viceversa. In order to suppress more source confusion, we divide the training sets into two layers of components and learn hierarchical sub-dictionaries using different layers. Experimental testing is shown to verify the superior performance compared with other existing approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.