2013
DOI: 10.1109/tasl.2013.2261814
|View full text |Cite
|
Sign up to set email alerts
|

Video-Aided Model-Based Source Separation in Real Reverberant Rooms

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2013
2013
2024
2024

Publication Types

Select...
7
1

Relationship

3
5

Authors

Journals

citations
Cited by 29 publications
(10 citation statements)
references
References 36 publications
0
10
0
Order By: Relevance
“…The dereverberation scheme is based on spectral subtraction [12]. In the second stage we employ an efficient model-based source separation technique which is motivated by aspects of the human auditory system by Mapped vector x in a feature space α Dual variables kðÁ; ÁÞ Kernel function combining the models of interaural level difference (ILD), interaural phase difference (IPD) and the model of mixing vectors [11]. This probabilistic modeling, which is performed in the time-frequency (TF) domain, yields TF soft masks for each source in the mixture that can be used for their reconstruction.…”
Section: Interference Suppressionmentioning
confidence: 99%
See 1 more Smart Citation
“…The dereverberation scheme is based on spectral subtraction [12]. In the second stage we employ an efficient model-based source separation technique which is motivated by aspects of the human auditory system by Mapped vector x in a feature space α Dual variables kðÁ; ÁÞ Kernel function combining the models of interaural level difference (ILD), interaural phase difference (IPD) and the model of mixing vectors [11]. This probabilistic modeling, which is performed in the time-frequency (TF) domain, yields TF soft masks for each source in the mixture that can be used for their reconstruction.…”
Section: Interference Suppressionmentioning
confidence: 99%
“…The ILD, IPD and mixing vector models are combined and the model parameters are estimated in the maximum likelihood sense using iterative expectation-maximization (EM) [11]. TF masks are generated after a fixed number of iterations and then used to reconstruct the acoustic signals from different sources.…”
Section: Source Localization and Separationmentioning
confidence: 99%
“…We used our recently proposed source separation (SS) algorithm [12] for this purpose. BRIRs used in this experiment [10] were measured in four different rooms with RT60s of 0.32, 0.47, 0.68, and 0.89 seconds.…”
Section: Dereverberation and Source Separationmentioning
confidence: 99%
“…The complementary information of AV data is truly adopted in audio-visual speech recognition (AVSR) in either of early-(feature), middle-(model), or late-(decoding) stage fusion schemes to enhance robustness against acoustic distortions. In recent years (since 2001 [7]), researchers have proposed methods based on exploiting the coherent component of AV processes for applicable tasks like speech enhancement [7][8][9], acoustic feature enhancement [10], visual voice activity detection (VVAD) [11], and AV source separation (AVSS) [11][12][13][14][15][16][17][18][19][20][21][22][23][24].…”
Section: Introductionmentioning
confidence: 99%
“…Khan et al [24] have proposed a video-aided separation method for two-channel reverberant recordings which estimates direction of sources via visual localization to be used in probabilistic models which are refined using EM algorithm and evaluated at discrete TF points to generate separating masks.…”
Section: Introductionmentioning
confidence: 99%