2020
DOI: 10.3390/sym12061051
|View full text |Cite
|
Sign up to set email alerts
|

Monaural Singing Voice and Accompaniment Separation Based on Gated Nested U-Net Architecture

Abstract: This paper proposes a separation model adopting gated nested U-Net (GNU-Net) architecture, which is essentially a deeply supervised symmetric encoder–decoder network that can generate full-resolution feature maps. Through a series of nested skip pathways, it can reduce the semantic gap between the feature maps of encoder and decoder subnetworks. In the GNU-Net architecture, only the backbone not including nested part is applied with gated linear units (GLUs) instead of conventional convolutional networks. The … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 32 publications
0
6
0
Order By: Relevance
“…Jansson et al [ 4 ] components an audio signal by converting it to an image, processing it with a U-Net neural network, and storing the resulting spectral mask. GNU-Net [ 10 ] leverages a supervised symmetric encoder-decoder architecture for generating full-resolution feature maps. SVSGAN [ 14 ] leverages the generative adversarial network with a time-frequency masking function for singing voice separation.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Jansson et al [ 4 ] components an audio signal by converting it to an image, processing it with a U-Net neural network, and storing the resulting spectral mask. GNU-Net [ 10 ] leverages a supervised symmetric encoder-decoder architecture for generating full-resolution feature maps. SVSGAN [ 14 ] leverages the generative adversarial network with a time-frequency masking function for singing voice separation.…”
Section: Related Workmentioning
confidence: 99%
“…Qian et al used stripe-transformer blocks to learn the deep stripe feature in encoder and decoder blocks, which are composed of residual CNN blocks [ 5 ]. Geng et al developed a gated nested U-Net(GNU-Net) architecture to generate full-resolution feature maps [ 10 ]. Yuan et al used genetic algorithms to search the effective MRP-CNN structures, which are composed of various-sized pooling operators, to extract multiresolution features.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…The researchers have proposed many deep learning models for BSS including recurrent neural networks(RNN) [4] [5] [11], convolutional neural networks(CNN) [12] [13], U-Net [6] [14] [15], long short-term memory(LSTM) [16] [17], generative adversarial networks (GAN) [18] [19] [20], etc. The results show that a DNN model trained with the singing voices of dozens of singers can separate the singing voices of others.…”
Section: Introductionmentioning
confidence: 99%