2014
DOI: 10.1007/978-3-642-55016-4_12
|View full text |Cite
|
Sign up to set email alerts
|

On the Ideal Ratio Mask as the Goal of Computational Auditory Scene Analysis

Abstract: The ideal binary mask (IBM) is widely considered to be the benchmark for time-frequency-based sound source separation techniques such as computational auditory scene analysis (CASA). However it is well known that binary masking introduces objectionable distortion, especially musical noise. This can make binary masking unsuitable for sound source separation applications where the output is auditioned. It has been suggested that soft masking reduces musical noise and leads to a higher quality output. A previousl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
39
0
3

Year Published

2015
2015
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 73 publications
(42 citation statements)
references
References 44 publications
0
39
0
3
Order By: Relevance
“…The IRM was employed as the training target for supervised speech segregation (Srinivasan et al, 2006;Narayanan and Wang, 2013;Hummersone et al, 2014;Wang et al, 2014). The IRM is defined as…”
Section: Irm Estimation Using Dnnmentioning
confidence: 99%
See 1 more Smart Citation
“…The IRM was employed as the training target for supervised speech segregation (Srinivasan et al, 2006;Narayanan and Wang, 2013;Hummersone et al, 2014;Wang et al, 2014). The IRM is defined as…”
Section: Irm Estimation Using Dnnmentioning
confidence: 99%
“…An ideal T-F mask indicates whether, or to what extent, each T-F unit is dominated by target speech. A binary decision leads to the ideal binary mask (IBM; Hu and Wang, 2001;Wang, 2005), whereas a ratio decision leads to the ideal ratio mask (IRM; Srinivasan et al, 2006;Narayanan and Wang, 2013;Hummersone et al, 2014;Wang et al, 2014). Unlike traditional speech enhancement, supervised segregation does not make explicit statistical assumptions about the underlying speech or noise signal, but rather learns data distributions from a training set.…”
Section: Introductionmentioning
confidence: 99%
“…Given a mixture in STFT domain where the signal in each TF bin either belongs solely to the desired or the undesired signal, extraction can be performed using binary masks [16] (e.g., [6], [8]). Given a mixture in STFT domain where several sources are active in the same TF bin, ratio masks (RMs) [17] or complex ratio masks (CRMs) [18] can be applied. Both assign a gain to each mixture TF bin to estimate the desired spectrum.…”
Section: Introductionmentioning
confidence: 99%
“…Para resolver este problema, a Ideal Ratio Mask (IRM) foi proposta em [197] com o objetivo de suavizar as unidades T-F ao invés de removê-las. A IRM proporciona um melhor desempenho porque está intimamente relacionada com o filtro de Wiener [123], onde um valor de SNR alto indica baixa atenuação da energia das unidades T-F, enquanto um valor de SNR baixo indica alta atenuação, suavizando todas as unidades T-F em vez de removê-las como o caso da IBM.…”
Section: Ideal Ratio Mask (Irm)unclassified
“…Utilizando as representações acústicas ou conjunto de características, o próximo passo é identificar as unidades que contêm informação dominante em relação ao ruído para agrupá-las e etiquetá-las como unidades confiáveis pertencentes ao mesmo som. Este procedimento pode ser realizado com máscaras baseadas na estimativa da SNR local [128] [197], máscaras baseadas na classificação Bayesiana do espectro [116][222], entre outras. Neste capítulo continuaremos com as máscaras baseadas na estimativa da SNR local, da mesma forma que nos capítulos 4 e 5.…”
Section: Capítulo 6 Segregação De Voz Usando a Máscara Inm Baseada Eunclassified