2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2017
DOI: 10.1109/waspaa.2017.8169997
|View full text |Cite
|
Sign up to set email alerts
|

Low latency sound source separation using convolutional recurrent neural networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
27
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
2
1
1

Relationship

2
7

Authors

Journals

citations
Cited by 31 publications
(27 citation statements)
references
References 20 publications
0
27
0
Order By: Relevance
“…With the advent of deep neural networks (DNNs), a large improvement in the performance of supervised speech separation has been reported starting with Wang and Wang (2013). Various network architectures have been employed, e.g., feedforward DNNs (Grais et al, 2014;Xu et al, 2015), recurrent neural networks (Erdogan et al, 2015;Huang et al, 2015;Weninger et al, 2014), deep autoencoders (Lu et al, 2013), convolutional neural networks (Chandna et al, 2017;Park and Lee, 2016), convolutional recurrent neural networks (Naithani et al, 2017), etc. These DNN-based approaches have employed either time-frequency masking (Huang et al, 2015;Weninger et al, 2014;Williamson and Wang, 2017) or spectral mapping (Grais et al, 2014;Park and Lee, 2016;Xu et al, 2014Xu et al, , 2015 approaches.…”
Section: Introductionmentioning
confidence: 99%
“…With the advent of deep neural networks (DNNs), a large improvement in the performance of supervised speech separation has been reported starting with Wang and Wang (2013). Various network architectures have been employed, e.g., feedforward DNNs (Grais et al, 2014;Xu et al, 2015), recurrent neural networks (Erdogan et al, 2015;Huang et al, 2015;Weninger et al, 2014), deep autoencoders (Lu et al, 2013), convolutional neural networks (Chandna et al, 2017;Park and Lee, 2016), convolutional recurrent neural networks (Naithani et al, 2017), etc. These DNN-based approaches have employed either time-frequency masking (Huang et al, 2015;Weninger et al, 2014;Williamson and Wang, 2017) or spectral mapping (Grais et al, 2014;Park and Lee, 2016;Xu et al, 2014Xu et al, , 2015 approaches.…”
Section: Introductionmentioning
confidence: 99%
“…This DNN predicts a soft mask that is applied to the mixture to yield magnitude estimates. We compare oMISI to its offline counterpart and to the amplitude mask (AM) used as a baseline [20].…”
Section: A Dataset and Protocolmentioning
confidence: 99%
“…Considering a practical situation in the real world, some research on DNN-based speech enhancement has focused on real-time application [19][20][21][22][23]. To apply an enhancement method in real time, the system must be causal, i.e., it uses past information only and does not require future information to estimate the enhanced signal.…”
Section: Introductionmentioning
confidence: 99%
“…To apply an enhancement method in real time, the system must be causal, i.e., it uses past information only and does not require future information to estimate the enhanced signal. Therefore, uni-directional LSTM are often used in that task [19][20][21][22]. However, as the price of mitigating the vanishing gradient problem, LSTM consists of a lot of parameters as in Fig.…”
Section: Introductionmentioning
confidence: 99%