2006
DOI: 10.1121/1.2355480
|View full text |Cite
|
Sign up to set email alerts
|

Binaural segregation in multisource reverberant environments

Abstract: In a natural environment, speech signals are degraded by both reverberation and concurrent noise sources. While human listening is robust under these conditions using only two ears, current two-microphone algorithms perform poorly. The psychological process of figure-ground segregation suggests that the target signal is perceived as a foreground while the remaining stimuli are perceived as a background. Accordingly, the goal is to estimate an ideal time-frequency (T-F) binary mask, which selects the target if … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
33
0

Year Published

2008
2008
2022
2022

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 50 publications
(33 citation statements)
references
References 34 publications
0
33
0
Order By: Relevance
“…Time-frequency masking techniques have been proposed to deal with segregation in reverberant environments [4], [5]. Recent approaches have relied on probabilistic frameworks that jointly perform source localization and time-frequency masking to segregate multiple sources [6]- [8].…”
mentioning
confidence: 99%
“…Time-frequency masking techniques have been proposed to deal with segregation in reverberant environments [4], [5]. Recent approaches have relied on probabilistic frameworks that jointly perform source localization and time-frequency masking to segregate multiple sources [6]- [8].…”
mentioning
confidence: 99%
“…In addition to SNR and ASR, the following abbreviations are used: DOA: direction of arrival HSI: human speech intelligibility HSQ: human speech quality NRR: noise-residual ratio RSR: retained-speech ratio SDR: speech-to-interference ratio Although none of these studies are formulated for real-time implementation, some algorithms are more suitable for real-time operations than others. Of these, the following algorithms that use beamforming to produce T-F masks are most promising: Aarabi and Shi (2004), Roman et al (2006), and Boldt et al (2008). The Aarabi and Shi algorithm based on phase analysis can be viewed as a form of fixed beamforming with given directions of arrival.…”
Section: Related Studiesmentioning
confidence: 99%
“…To deal with the difficulty posed by room reverberation, Roman and Wang (2004) and Roman, Srinivasan, and Wang (2006) proposed using adaptive beamforming to provide the basis for binary T-F masking in order to segregate a target source from a reverberant mixture. This use of beamforming to generate a binary mask should be contrasted with the use of a beamformer to produce a soft mask in conjunction with ICA (see .…”
Section: Beamforming and T-f Maskingmentioning
confidence: 99%
“…Studies show that speech reconstructed from the idea binary mask produces large improvement in human speech intelligibility [3], [13], [35]. Such a goal has been shown to still be reasonable when room reverberation is present [42], [47].…”
mentioning
confidence: 99%