Single channel speech dereverberation and separation using RPCA and SNMF

Ullah, Rizwan; Islam, Shohidul; Hossain, Md. Imran; Wahab, Fazal; Ye, Zhongfu

doi:10.1016/j.apacoust.2020.107406

Cited by 10 publications

(5 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…They showed the PCA as a great tool for voice recognition, and ICA can separate the signal near about original signal. A singing voice separation method using Robust PCA is presented [3]. The repetition structure of music accompaniment can be regarded as a low-rank subspace, and singing voices can be considered sparse inside the songs.…”

Section: Literature Reviewmentioning

confidence: 99%

“…Depending on the variety of channels, speech separation concerns are categorized as single-channel, multichannel, or binaural. A single-channel SS (SCSS) method [3][4][5] is complicated because only a single recording is obtainable, and the description that may be retrieved is limited. www.aetic.theiaer.org Most SCSS approaches can be divided into two types: those based on computational auditory scene analysis (CASA) and those based on models.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Single-channel Speech Separation Based on Double-density Dual-tree CWT and SNMF

Hossain,

Rahim,

Hossain

2024

AETiC

View full text Add to dashboard Cite

Speech is essential to human communication; therefore, distinguishing it from noise is crucial. Speech separation becomes challenging in real-world circumstances with background noise and overlapping speech. Moreover, the speech separation using short-term Fourier transform (STFT) and discrete wavelet transform (DWT) addresses time and frequency resolution and time-variation issues, respectively. To solve the above issues, a new speech separation technique is presented based on the double-density dual-tree complex wavelet transform (DDDTCWT) and sparse non-negative matrix factorization (SNMF). The signal is separated into high-pass and low-pass frequency components using DDDTCWT wavelet decomposition. For this analysis, we only considered the low-pass frequency components and zeroed out the high-pass ones. Subsequently, the STFT is then applied to each sub-band signal to generate a complex spectrogram. Therefore, we have used SNMF to factorize the joint form of magnitude and the absolute value of real and imaginary (RI) components that decompose the basis and weight matrices. Most researchers enhance the magnitude spectra only, ignore the phase spectra, and estimate the separated speech using noisy phase. As a result, some noise components are present in the estimated speech results. We are dealing with the signal's magnitude as well as the RI components and estimating the phase of the RI parts. Finally, separated speech signals can be achieved using the inverse STFT (ISTFT) and the inverse DDDTCWT (IDDDTCWT). Separation performance is improved for estimating the phase component and the shift-invariant, better direction selectivity, and scheme freedom properties of DDDTCWT. The speech separation efficiency of the proposed algorithm outperforms performance by 6.53–8.17 dB SDR gain, 7.37-9.87 dB SAR gain, and 14.92–17.21 dB SIR gain compared to the NMF method with masking on the TIMIT dataset.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Single-channel Speech Separation Based on Double-density Dual-tree CWT and SNMF

Hossain,

Rahim,

Hossain

2024

AETiC

View full text Add to dashboard Cite

show abstract

“…The combination of the sparse and NMF coding algorithms results in a model learning method called SNMF [10][11][12]. This technique results in a sparser representation than the NMF algorithm to apply the sparse constraints.…”

Section: The Proposed Voice Activity Detectormentioning

confidence: 99%

“…As stated, the SNMF algorithm will obtain a sparser representation to consider a specified constraint than the NMF algorithm [11][12][13]. The generalized Kullback-Leibler divergence algorithm is then used to determine the approximation error in the analysis of non-negative coefficients, which results in the following optimization problem :…”

Section: The Proposed Voice Activity Detectormentioning

confidence: 99%

A Voice Activity Detection Algorithm Using Sparse Non-negative Matrix Factorization-based Model Learning in Spectro-Temporal Domain

Mavaddati

2023

IJE

View full text Add to dashboard Cite

Voice activity detectors are presented to extract silence/speech segments of the speech signal to eliminate different background noise signals. A novel voice activity detector is proposed in this paper using spectro-temporal features extracted from the auditory model of the speech signal. After extracting the scale, rate, and frequency features from this feature space, a sparse structured principal component analysis algorithm is used to consider the basic components of these features and reduce the dimension of learning data. Then these feature vectors are employed to learn the models by the sparse non-negative matrix factorization algorithm. The model learning procedure is performed to represent each feature vector with a proper sparse rate based on the selected atoms. Voice activity detection of the input frames is performed by computing the energy of the sparse representation for each input frame over the composite model. If the calculated energy exceeds a specified threshold, it indicates that the input frame has a structure similar to the atoms of the learned models and concludes that the observed frame has voice content. The results of the proposed detector were compared with other baseline methods and classifiers in this processing field. These results in the presence of stationary, non-stationary and periodic noises were investigated and they are shown that the proposed method based on model learning with spectro-temporal features can correctly detect the silence/speech activities.

show abstract

“…[22] deals with noisy and reverberant speech separation by estimating a room impulse response. [23] used robust principal component analysis and sparse nonnegative matrix factorization for reverberant speech separation. [24] applies a diffusion-based generative technology to separate a mixture of reverberant speech.…”

Section: Related Workmentioning

confidence: 99%

Leveraging Sparse Approximation for Monaural Overlapped Speech Separation From Auditory Perspective

Sekiguchi,

Narusue,

Morikawa

2023

IEEE Access

View full text Add to dashboard Cite

Neuroscience suggests that the sparse behavior of a neural population underlies the mechanisms of the auditory system for monaural overlapped speech separation. This study investigates leveraging sparse approximation to improve speech separation in a conventional deep learning algorithm. We develop a combined model that embeds a sparse approximation algorithm, a multilayered iterative soft thresholding algorithm (ML-ISTA), into a conventional time-domain-based speech separation algorithm, Conv-TasNet. Adopting ML-ISTA is a crucial enabler for the embedding process and helps avoid solving a bi-level optimization problem comprising sparse approximation and speech separation. ML-ISTA performs sparse approximation through forward calculations, thereby eliminating the optimization of sparse approximation. The combined model is trained with WSJ0-2mix, the Wall Street Journal English corpus for two-speaker mixed speech without noisy or reverberant interference, to clarify the proposed method's performance. The model demonstrates that sparse approximation improves separation performance regardless of the approximation setting. The peak performance of the model exceeds that of Conv-TasNet by 1.1% to 4.7% in four speech quality criteria. Moreover, sparse approximation accelerates the combined model performance gain at the early stages of learning relative to Conv-TasNet. The primary novelty of the study is embedding the sparse approximation algorithm, ML-ISTA, into a deep-learning-based speech separation framework and the experimental proof of improved separation performance in the proposed algorithm.

show abstract

Single channel speech dereverberation and separation using RPCA and SNMF

Cited by 10 publications

References 38 publications

Single-channel Speech Separation Based on Double-density Dual-tree CWT and SNMF

Single-channel Speech Separation Based on Double-density Dual-tree CWT and SNMF

A Voice Activity Detection Algorithm Using Sparse Non-negative Matrix Factorization-based Model Learning in Spectro-Temporal Domain

Leveraging Sparse Approximation for Monaural Overlapped Speech Separation From Auditory Perspective

Contact Info

Product

Resources

About