2016
DOI: 10.1109/taslp.2015.2512042
|View full text |Cite
|
Sign up to set email alerts
|

Complex Ratio Masking for Monaural Speech Separation

Abstract: Speech separation systems usually operate on the short-time Fourier transform (STFT) of noisy speech, and enhance only the magnitude spectrum while leaving the phase spectrum unchanged. This is done because there was a belief that the phase spectrum is unimportant for speech enhancement. Recent studies, however, suggest that phase is important for perceptual quality, leading some researchers to consider magnitude and phase spectrum enhancements. We present a supervised monaural speech separation approach that … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
331
0
4

Year Published

2016
2016
2023
2023

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 685 publications
(337 citation statements)
references
References 30 publications
2
331
0
4
Order By: Relevance
“…Erdogan et al [9] and Weninger et al [34] further extended the SA to a phase-sensitive case and used LSTM for speech denoising. Williamson et al [36] proposed complex ratio masking for DNN based monaural speech separation, which learns the real and imaginary components of complex spectrograms jointly in the Cartesian coordinate system instead of learning magnitude spectrograms only in the traditional polar coordinate system. The method improves speech quality significantly.…”
Section: Introductionmentioning
confidence: 99%
“…Erdogan et al [9] and Weninger et al [34] further extended the SA to a phase-sensitive case and used LSTM for speech denoising. Williamson et al [36] proposed complex ratio masking for DNN based monaural speech separation, which learns the real and imaginary components of complex spectrograms jointly in the Cartesian coordinate system instead of learning magnitude spectrograms only in the traditional polar coordinate system. The method improves speech quality significantly.…”
Section: Introductionmentioning
confidence: 99%
“…R = 246). After including temporal correlations, the feature vector has the dimensionality of 246 × (2 p + 1) = 246 × 5 = 1230 ( p is set to 2 based on our prior study [42]). Therefore, the input layer of the DNN has 1230 units.…”
Section: Evaluations and Resultsmentioning
confidence: 99%
“…The features are computed for each time frame of the signal. A variant of this feature set has been shown to be effective for speech separation [38], and they have recently been shown to work well for cIRM estimation [42]. …”
Section: Algorithm Descriptionmentioning
confidence: 99%
See 2 more Smart Citations