“…With the advent of deep neural networks (DNNs), a large improvement in the performance of supervised speech separation has been reported starting with Wang and Wang (2013). Various network architectures have been employed, e.g., feedforward DNNs (Grais et al, 2014;Xu et al, 2015), recurrent neural networks (Erdogan et al, 2015;Huang et al, 2015;Weninger et al, 2014), deep autoencoders (Lu et al, 2013), convolutional neural networks (Chandna et al, 2017;Park and Lee, 2016), convolutional recurrent neural networks (Naithani et al, 2017), etc. These DNN-based approaches have employed either time-frequency masking (Huang et al, 2015;Weninger et al, 2014;Williamson and Wang, 2017) or spectral mapping (Grais et al, 2014;Park and Lee, 2016;Xu et al, 2014Xu et al, , 2015 approaches.…”