Spatial and coherence cues based time-frequency masking for binaural reverberant speech separation

Alinaghi, Atiyeh; Wang, Wenwu; Jackson, Philip J. B.

doi:10.1109/icassp.2013.6637735

Cited by 16 publications

(11 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…8 and 9. We would like to note that incorporating a precedence model would be expected to improve the performance of binaural method in reverberation as suggested by our preliminary work in [39].…”

Section: E Spatially Diffuse Noisementioning

confidence: 97%

Joint Mixing Vector and Binaural Model Based Stereo Source Separation

Alinaghi

Jackson

Liu

et al. 2014

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

In this paper the mixing vector (MV) in the statistical mixing model is compared to the binaural cues represented by interaural level and phase differences (ILD and IPD). It is shown that the MV distributions are quite distinct while binaural models overlap when the sources are close to each other. On the other hand, the binaural cues are more robust to high reverberation than MV models. According to this complementary behavior we introduce a new robust algorithm for stereo speech separation which considers both additive and convolutive noise signals to model the MV and binaural cues in parallel and estimate probabilistic time-frequency masks. The contribution of each cue to the final decision is also adjusted by weighting the log-likelihoods of the cues empirically. Furthermore, the permutation problem of the frequency domain blind source separation (BSS) is addressed by initializing the MVs based on binaural cues. Experiments are performed systematically on determined and underdetermined speech mixtures in five rooms with various acoustic properties including anechoic, highly reverberant, and spatially-diffuse noise conditions. The results in terms of signal-to-distortion-ratio (SDR) confirm the benefits of integrating the MV and binaural cues, as compared with two state-of-the-art baseline algorithms which only use MV or the binaural cues

show abstract

Section: E Spatially Diffuse Noisementioning

confidence: 97%

Joint Mixing Vector and Binaural Model Based Stereo Source Separation

Alinaghi

Jackson

Liu

et al. 2014

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

show abstract

“…It is hard to calculate accurate auto-and cross-PSD (i.e., Φ ij (n, f )) using Equation (2) with finite-length X 1 (n, f ) and X 2 (n, f ). In previous studies [9][10][11], the PSD was estimated by multiplying exponentially decaying weight and summing continuous time-frequency bins over time as…”

Section: Interaural Coherencementioning

confidence: 99%

“…If the two microphones are apart far enough, the IC appears close to zero for diffuse sources and close to one for direct sources at most frequencies. Based on these characteristics, the performance is improved by applying IC to the direction of arrival (DoA) estimation of the speaker [9], speech, or source separation [10] and dereverberation [11][12][13] in a reverberation environment.…”

Section: Introductionmentioning

confidence: 99%

“…To get the best performance of the various speech preprocessing algorithms that use coherence, the estimated ICs of reverberant speech should match the ideal IC. Most of the algorithms [9][10][11][12][13] use the IC in the form of an infinite impulse response (IIR) filter in the calculation of power spectral densities. Due to recursive nature of the IIR filter, all of the past data have influence on the IC estimation, which may give an adversary effect on non-stationary speech data.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Interaural Coherence Estimation for Speech Processing in Reverberant Environment

Kim

Park

2020

Applied Sciences

View full text Add to dashboard Cite

Interaural coherence is used to quantify the effects of reverberation on speech, and previous studies applied the conventional method using all previous time data in the form of an infinite impulse response filter to estimate interaural coherence. To consider a characteristic of speech that continuously changes over time, this paper proposes a new method of estimating interaural coherence using time data within a finite length of speech, which is called the quasi-steady interval. The length of the quasi-steady interval is determined with various frequency bands, reverberation times, and short-time Fourier transform (STFT) variables through numerical experiment, and it decreased as reverberation time decreased and the frequency increased. In this interval, a diffuse speech, which is an infinite sum of reflected speeches of different propagating paths, is uncorrelated between two microphones apart from each other; thus, the coherence is close to zero. However, a direct speech measured at the two microphones has steady amplitude and phase difference in this internal; thus, the coherence is close to one. Moreover, the new method is the form of a finite impulse response filter that has a linear phase delay or zero phase delay with respect to speech to frequency; thus, the same or zero time delay for each frequency is applied to the power spectral density. Therefore, the coherence estimation of the new method is closer to the ideal value than the conventional one, and the coherence is accurately estimated at the time–frequency bins of direct speech, which is time-varying according to speech variation.

show abstract

“…Considering the fact that the prior information of speech and noise can improve speech quality, our former works [26,27] have shown an effectiveness of using binaural inter-channel cues between speech and noise to enhance speech. In previous studies based on the cue parameter [28][29][30][31][32][33][34][35][36][37][38][39], the binaural inter-channel cues [28][29][30][31][32][33][34][35][36][37] have been used to estimate ideal T-F mask in binaural computational auditory scene analysis (CASA) systems and have shown a good performance in binaural speech processing. In the BCC technique [40][41][42], the binaural inter-channel cues were viewed as the side information, which was combined with a down-mixed audio signal to recover the left channel and right channel audio signals.…”

Section: Introductionmentioning

confidence: 99%

Speech enhancement methods based on binaural cue coding

Wang

Bao

2019

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

According to the encoding and decoding mechanism of binaural cue coding (BCC), in this paper, the speech and noise are considered as left channel signal and right channel signal of the BCC framework, respectively. Subsequently, the speech signal is estimated from noisy speech when the inter-channel level difference (ICLD) and inter-channel correlation (ICC) between speech and noise are given. In this paper, exact inter-channel cues and the pre-enhanced inter-channel cues are used for speech restoration. The exact inter-channel cues are extracted from clean speech and noise, and the pre-enhanced inter-channel cues are extracted from the pre-enhanced speech and estimated noise. After that, they are combined one by one to form a codebook. Once the pre-enhanced cues are extracted from noisy speech, the exact cues are estimated by a mapping between the pre-enhanced cues and a prior codebook. Next, the estimated exact cues are used to obtain a time-frequency (T-F) mask for enhancing noisy speech based on the decoding of BCC. In addition, in order to further improve accuracy of the T-F mask based on the inter-channel cues, the deep neural network (DNN)-based method is proposed to learn the mapping relationship between input features of noisy speech and the T-F masks. Experimental results show that the codebook-driven method can achieve better performance than conventional methods, and the DNN-based method performs better than the codebook-driven method.

show abstract

Spatial and coherence cues based time-frequency masking for binaural reverberant speech separation

Cited by 16 publications

References 15 publications

Joint Mixing Vector and Binaural Model Based Stereo Source Separation

Joint Mixing Vector and Binaural Model Based Stereo Source Separation

Interaural Coherence Estimation for Speech Processing in Reverberant Environment

Speech enhancement methods based on binaural cue coding

Contact Info

Product

Resources

About