Separating Underdetermined Convolutive Speech Mixtures

Pedersen, Michael; Wang, DeLiang; Larsen, Jan; Kjems, Ulrik

doi:10.1007/11679363_84

Cited by 9 publications

(6 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another option is to apply a convolutive ICA algorithm [19] instead of an instantaneous ICA method. This was done in [45]. The advantage of using a convolutive algorithm compared to a instantaneous algorithm is that the convolutive algorithm is able to segregate sources, with larger microphone distances.…”

Section: E Separation Results For Reverberant Recordingsmentioning

confidence: 99%

“…Even though the criterion is applied to narrow frequency bands, the performance becomes worse as reported in [65]. In [45], we used a single-microphone criterion based on the properties of speech. There are some advantages of applying an instantaneous ICA as opposed to applying a convolutive ICA algorithm.…”

Section: E Separation Results For Reverberant Recordingsmentioning

confidence: 99%

“…In [44] it has been demonstrated that the approach can be used to segregate stereo music recordings into single instruments or singing voice. In [45] we described an extension to separate convolutive speech mixtures.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Two-Microphone Separation of Speech Mixtures

Pedersen

Wang

Larsen

et al. 2008

IEEE Trans. Neural Netw.

View full text Add to dashboard Cite

Separation of speech mixtures, often referred to as the cocktail party problem, has been studied for decades. In many source separation tasks, the separation method is limited by the assumption of at least as many sensors as sources. Further, many methods require that the number of signals within the recorded mixtures be known in advance. In many real-world applications, these limitations are too restrictive. We propose a novel method for underdetermined blind source separation using an instantaneous mixing model which assumes closely spaced microphones. Two source separation techniques have been combined, independent component analysis (ICA) and binary time - frequency (T-F) masking. By estimating binary masks from the outputs of an ICA algorithm, it is possible in an iterative way to extract basis speech signals from a convolutive mixture. The basis signals are afterwards improved by grouping similar signals. Using two microphones, we can separate, in principle, an arbitrary number of mixed speech signals. We show separation results for mixtures with as many as seven speech signals under instantaneous conditions. We also show that the proposed method is applicable to segregate speech signals under reverberant conditions, and we compare our proposed method to another state-of-the-art algorithm. The number of source signals is not assumed to be known in advance and it is possible to maintain the extracted signals as stereo signals.

show abstract

Section: E Separation Results For Reverberant Recordingsmentioning

confidence: 99%

Section: E Separation Results For Reverberant Recordingsmentioning

confidence: 99%

See 1 more Smart Citation

Two-Microphone Separation of Speech Mixtures

Pedersen

Wang

Larsen

et al. 2008

IEEE Trans. Neural Netw.

View full text Add to dashboard Cite

show abstract

“…Some methods rely on the observation that individual signals in a mixture are sparsely distributed in the time-frequency domain [39], [54]. This enables them to handle a variety of mixing conditions, including those involving more sources than sensors [35]. The use of a binary mask as the computational goal makes only weak assumptions about interference conditions.…”

Section: Introductionmentioning

confidence: 99%

Transforming Binary Uncertainties for Robust Speech Recognition

Srinivasan

Wang

2007

IEEE Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

Abstract-Recently, several algorithms have been proposed to enhance noisy speech by estimating a binary mask that can be used to select those time-frequency regions of a noisy speech signal that contain more speech energy than noise energy. This binary mask encodes the uncertainty associated with enhanced speech in the linear spectral domain. The use of the cepstral transformation smears the information from the noise dominant time-frequency regions across all the cepstral features. We propose a supervised approach using regression trees to learn the nonlinear transformation of the uncertainty from the linear spectral domain to the cepstral domain. This uncertainty is used by a decoder that exploits the variance associated with the enhanced cepstral features to improve robust speech recognition. Systematic evaluations on a subset of the Aurora4 task using the estimated uncertainty show substantial improvement over the baseline performance across various noise conditions. Index Terms-Binary time-frequency mask, computational auditory scene analysis (CASA), robust automatic speech recognition, spectrogram reconstruction, uncertainty decoding.

show abstract

“…Hereby a higher signal to interference ratio is obtained. This method was further developed by Pedersen et al (2005 in order to segregate under-determined mixtures [228,229]. Because the T-F mask can be applied to a single microphone signal, the segregated signals can be maintained as e.g.…”

Section: Sparseness In the Time/frequency Domainmentioning

confidence: 99%