In speech separation tasks, many separation methods have the limitation that the microphones are closely spaced, which means that these methods are unprevailing for phase wrap-around. In this paper, we present a novel speech separation scheme by using two microphones that does not have this restriction. The technique utilizes the estimation of interaural time difference (ITD) statistics and binary time-frequency mask for the separation of mixed speech sources. The novelties of the paper consist in: (1) the extended application of delay-and-sum beamforming (DSB) and cosine function for ITD calculation; and (2) the clarification of the connection between ideal binary mask and DSB amplitude ratio. Our objective quality evaluation experiments demonstrate the effectiveness of the proposed method.
Speech enhancement is an important task in many applications such as speech recognition. Conventional methods always require some principles by which to distinguish speech and noise and the most successful enhancement requires strong models for both speech and noise. However, if the noise actually encountered differs significantly from the system's assumptions, performance will rapidly declines. In this work, we propose an unsupervised speech enhancement system based on decomposing the frequency-time spectrogram into a sparse foreground speech and a low-rank background noise, which makes few assumptions about the noise other than its limited spectral variation. An image based masking is also designed to handle the poor performance of noise removing when using spectrogram decomposition only. Evaluations via PESQ and SegSNR show that the new approach improves signal-to-distortion ratio and PESQ in most cases when compared to several traditional speech enhancement algorithms.
In this letter we present a novel speech separation scheme using two microphones. The proposed method utilizes the estimation of interaural time difference (ITD) statistics for the separation of mixed speech sources. The novelties of this paper consist in the use of Generalized Gaussian Mixture Model (GGMM) for speech separation frame by frame and cross-correlation coefficient for distributed parameter selection. The proposed model can be extended to audio enhancement. Our objective quality evaluation experiments demonstrate the effectiveness of the proposed methods and show significant quality improvements over the conventional dual ITD based methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.