2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC) 2018
DOI: 10.1109/iwaenc.2018.8521371
|View full text |Cite
|
Sign up to set email alerts
|

Harmonic-Percussive Source Separation with Deep Neural Networks and Phase Recovery

Abstract: Harmonic/percussive source separation (HPSS) consists in separating the pitched instruments from the percussive parts in a music mixture. In this paper, we propose to apply the recently introduced Masker-Denoiser with twin networks (MaD TwinNet) system to this task. MaD TwinNet is a deep learning architecture that has reached state-of-the-art results in monaural singing voice separation. Herein, we propose to apply it to HPSS by using it to estimate the magnitude spectrogram of the percussive source. Then, we … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(13 citation statements)
references
References 25 publications
0
13
0
Order By: Relevance
“…We showed that the mask type used for HPS stage has an effect on the source estimation quality. Hence, different HPS methods, for instance, [3,5], should be investigated. We showed that the mask type at the final masking stage needs to be chosen to optimise for a given source estimation algorithm.…”
Section: Discussionmentioning
confidence: 99%
“…We showed that the mask type used for HPS stage has an effect on the source estimation quality. Hence, different HPS methods, for instance, [3,5], should be investigated. We showed that the mask type at the final masking stage needs to be chosen to optimise for a given source estimation algorithm.…”
Section: Discussionmentioning
confidence: 99%
“…Although this method considers phase information, the phase of the audio mixture is still utilized for resynthesizing the separated time domain signals. In contrast, the method presented in [19] utilizes the sinusoidal model for modifying phase. Specifically, the time-frequency mask is estimated by a deep neural network, and phases of separated signals are estimated by the specific algorithm based on the sinusoidal model [13].…”
Section: Hpss Based On Sinusoidal Modelmentioning
confidence: 99%
“…However, the phase is not modified through time-frequency masking, and thus the degraded phase from the audio mixture is utilized for resynthesizing back to the time-domain. On the other hand, [19] utilizes the sinusoidal model for recovering the phase after applying a time-frequency mask obtained by a deep neural network. However, simultaneous modification of both amplitude and phase has not been presented for HPSS.…”
Section: Introductionmentioning
confidence: 99%
“…Phase recovery, which tries to obtain a phase spectrogram that is reasonable with respect to the given amplitude, has been applied in signal enhancement [9][10][11][12][13][14], generation [15][16][17][18], and separation [19][20][21]. It has also been applied to HPSS after the estimation of the amplitude spectrograms [22,23] for improving the quality of separation. A popular phase-recovery method is temporal linear phase unwrapping (PU) [24], which recursively calculates the subsequent phase from the instantaneous frequency based on the sinusoidal model, as described in Sect.…”
Section: Introductionmentioning
confidence: 99%
“…PU is a computationally efficient method, which makes it popular in HPSS because HPSS is mainly used as preprocessing whose computation should be cheap. In HPSS, it is usually applied to the harmonic components, and the phase of the percussive components is intact or iteratively estimated using enhanced harmonic components [23].…”
Section: Introductionmentioning
confidence: 99%