“…An ideal T-F mask indicates whether, or to what extent, each T-F unit is dominated by target speech. A binary decision leads to the ideal binary mask (IBM; Hu and Wang, 2001;Wang, 2005), whereas a ratio decision leads to the ideal ratio mask (IRM; Srinivasan et al, 2006;Narayanan and Wang, 2013;Hummersone et al, 2014;Wang et al, 2014). Unlike traditional speech enhancement, supervised segregation does not make explicit statistical assumptions about the underlying speech or noise signal, but rather learns data distributions from a training set.…”