We propose a simple Bayesian model for performing single channel speech separation using factorized source priors in a sliding window linearly transformed domain. Using a one dimensional mixture of Gaussians to model each band source leads to fast tractable inference for the source signals. Simulations with separation of a male and female speaker using priors trained on the same speakers show comparable performance with the blind separation approach of Jang and Lee [1] with a SNR improvement of 4.9 dB for both the male and female speaker. Mixing coefficients can be estimated quite precisely using ML-II, but the estimation is quite sensitive to the accuracy of the priors as opposed to the source separation quality for known mixing coefficients which is quite insensitive to the accuracy of the priors. Finally, we discuss how to improve our approach while keeping the complexity low using machine learning and CASA approaches [1,2,3,4].
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.