2019
DOI: 10.1109/jstsp.2019.2904183
|View full text |Cite
|
Sign up to set email alerts
|

Phasebook and Friends: Leveraging Discrete Representations for Source Separation

Abstract: Deep learning based speech enhancement and source separation systems have recently reached unprecedented levels of quality, to the point that performance is reaching a new ceiling. Most systems rely on estimating the magnitude of a target source by estimating a real-valued mask to be applied to a time-frequency representation of the mixture signal. A limiting factor in such approaches is a lack of phase estimation: the phase of the mixture is most often used when reconstructing the estimated time-domain signal… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
69
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 72 publications
(69 citation statements)
references
References 30 publications
0
69
0
Order By: Relevance
“…Motivated by the recent advance in deep learning, several DNNbased phase reconstruction methods have been presented [18][19][20][21][22][23]. However, phase reconstruction from a given amplitude spectrogram is not an easy task for DNNs due to the following two problems: the wrapping effect and sensitivity to a shift of a waveform.…”
Section: Phase Reconstruction Via Dnnmentioning
confidence: 99%
See 1 more Smart Citation
“…Motivated by the recent advance in deep learning, several DNNbased phase reconstruction methods have been presented [18][19][20][21][22][23]. However, phase reconstruction from a given amplitude spectrogram is not an easy task for DNNs due to the following two problems: the wrapping effect and sensitivity to a shift of a waveform.…”
Section: Phase Reconstruction Via Dnnmentioning
confidence: 99%
“…To take advantage of more knowledge about the target signal, deep neural network (DNN)-based phase reconstruction [18][19][20][21][22][23] has gained increasing attention. Although DNNs have strong modeling capability and learn rich knowledge from training data, DNN-based phase reconstruction has the following two problems: the wrapping effect and sensitivity to a waveform shift.…”
Section: Introductionmentioning
confidence: 99%
“…One approach is to estimate magnitude and phase or a phase representation either directly [5] or use the estimated magnitude and noisy phase to predict the clean phase [8]. Estimating the clean phase directly, however, is quite hard, because of its spontaneous, random-like nature.…”
Section: Related Workmentioning
confidence: 99%
“…Due to a superposition of multiple, quasi static signals within one frequency bin, not only the phase changes over time but also the magnitude as a result of cancellation. Thus, low-res spectrograms limit the effectiveness of standard complex valued processing methods that mainly bring phase improvement [5,8].…”
Section: Introductionmentioning
confidence: 99%
“…Recently, speech enhancement is advanced by the use of a deep neural network (DNN) to estimate a T-F mask. For effectively modelling a speech signal which is timesequential data, a recurrent neural network (RNN) is used in various speech signal processing applications [1][2][3][4][5][6][7][8][9][10][11][12][13][14].…”
Section: Introductionmentioning
confidence: 99%