2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) 2018
DOI: 10.1109/icmla.2018.00123
|View full text |Cite
|
Sign up to set email alerts
|

Denoising Auto-Encoder with Recurrent Skip Connections and Residual Regression for Music Source Separation

Abstract: Convolutional neural networks with skip connections have shown good performance in music source separation. In this work, we propose a denoising Auto-encoder with Recurrent skip Connections (ARC). We use 1D convolution along the temporal axis of the time-frequency feature map in all layers of the fully-convolutional network. The use of 1D convolution makes it possible to apply recurrent layers to the intermediate outputs of the convolution layers. In addition, we also propose an enhancement network and a resid… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 37 publications
(29 citation statements)
references
References 19 publications
0
29
0
Order By: Relevance
“…Second, unlike prior arts (including [16]), we investigate one additional way to employ SS to improve SID. Given the separated vocal tracks and instrumental tracks of the audio recordings in the training set, we perform the so-called "data augmentation" [19][20][21][22] by randomly shuffling the separated tracks of different songs and then remixing them. For example, we remix the vocal part of a song from a singer with the instrumental part of another song from a different singer.…”
Section: Conv Blockmentioning
confidence: 99%
See 1 more Smart Citation
“…Second, unlike prior arts (including [16]), we investigate one additional way to employ SS to improve SID. Given the separated vocal tracks and instrumental tracks of the audio recordings in the training set, we perform the so-called "data augmentation" [19][20][21][22] by randomly shuffling the separated tracks of different songs and then remixing them. For example, we remix the vocal part of a song from a singer with the instrumental part of another song from a different singer.…”
Section: Conv Blockmentioning
confidence: 99%
“…This technique has been popular for some time among the machine learning community. It has also been shown beneficial for MIR tasks such as singing voice detection and source separation [20][21][22] (but not yet for SID).…”
Section: Data Augmentation: Separate Shuffle and Remixmentioning
confidence: 99%
“…Mean square error is used as the loss function for updating the network, and Adam [Kingma and Ba, 2015] is used to update the weights. As data augmentation has been found useful in the literature [Takahashi et al, 2018;Uhlich et al, 2017;Liu and Yang, 2018], we use data augmentation in the training process by randomly shuffling the audio clips in each source and then collecting the audio clips from the four sources in the shuffled orders. A mixture clip is formed by summing the collected source clips.…”
Section: Evaluation Setupmentioning
confidence: 99%
“…The phases of the mixture complex spectrograms are used with the predicted spectrogram magnitudes to construct the complex spectrogram. Before converting back to waveforms, multi-channel Wiener filter is applied to the complex spectrograms as widely done in recent source separation systems [Nugraha et al, 2016;Uhlich et al, 2017;Takahashi et al, 2018;Liu and Yang, 2018]. Table 2 shows the performance of the proposed models and the top-performing models of SiSEC2018 [Rafii et al, 2018].…”
Section: Evaluation Setupmentioning
confidence: 99%
“…There are generally three basic structures to construct DNNs: Feed-Forward Network (FFN) [4], Recurrent Neural Network (RNN) [5], and Convolutional Neural Network (CNN) [6] [7]. Recently the RNN and CNN have been combined to improve the MSS [8], [9].…”
Section: Introductionmentioning
confidence: 99%