2021 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR) 2021
DOI: 10.1109/aivr52153.2021.00067
|View full text |Cite
|
Sign up to set email alerts
|

A Survey on Recent Deep Learning-driven Singing Voice Synthesis Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 13 publications
(6 citation statements)
references
References 16 publications
0
3
0
Order By: Relevance
“…Naturally, such systems depend critically on the expressivity of the latent representation and the distribution of training examples, as the decoder may not learn to generate a signal with characteristics that are not observable in the training set. This lack of control is particularly problematic for singing voice signals (Cho et al, 2021), as important expressive parameters like F0 are highly variable within individual notes (Dai and Dixon, 2019) and may not be accurately reproduced by SE or SVS systems, even when F0 is explicitly given in the latent representation (Choi et al, 2022).…”
Section: Singing Voice Reconstructionmentioning
confidence: 99%
“…Naturally, such systems depend critically on the expressivity of the latent representation and the distribution of training examples, as the decoder may not learn to generate a signal with characteristics that are not observable in the training set. This lack of control is particularly problematic for singing voice signals (Cho et al, 2021), as important expressive parameters like F0 are highly variable within individual notes (Dai and Dixon, 2019) and may not be accurately reproduced by SE or SVS systems, even when F0 is explicitly given in the latent representation (Choi et al, 2022).…”
Section: Singing Voice Reconstructionmentioning
confidence: 99%
“…Nercessian (2021) incorporated a differentiable harmonic-plusnoise synthesiser (Engel et al, 2020a) to a end-to-end VC model, augmenting it with convolutional pre-and post-nets to further shape the generated signal. This formulation allowed end-to-end training with perceptually informed loss functions, as opposed to requiring autoregression.…”
Section: Voice Transformationmentioning
confidence: 99%
“…Harshvardhan et al [76] instead covered deep generation as part of generation in machine learning and proposed future directions. In addition to surveys on general deep data generation, other surveys may focus on the deep data generation in specific domains including graph generation [77][78][79], image synthesis [80,81], text generation [82,83] and audio generation [84][85][86].…”
Section: Relationship With Existing Surveysmentioning
confidence: 99%