2022
DOI: 10.48550/arxiv.2201.07429
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis

Abstract: This paper introduces Opencpop, a publicly available highquality Mandarin singing corpus designed for singing voice synthesis (SVS). The corpus consists of 100 popular Mandarin songs performed by a female professional singer. Audio files are recorded with studio quality at a sampling rate of 44,100 Hz and the corresponding lyrics and musical scores are provided. All singing recordings have been phonetically annotated with phoneme boundaries and syllable (note) boundaries. To demonstrate the reliability of the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 24 publications
0
6
0
Order By: Relevance
“…In addition, for some rare and difficult musical notes, WeSinger 2 also demonstrates a much better performance than WeSinger [10], which can be reflected in the lower F0 RMSE value. Besides, we found that most of the generated singing audios by WeSinger 2 do not have metallic or inconsistent artifacts as found in [30] and do not sound unnatural. Apart from the performance, around 3 times faster than real-time synthesis can be achieved on a single core of a moderate CPU @ 2.60 GHz in Python environment.…”
Section: Evaluationsmentioning
confidence: 64%
See 2 more Smart Citations
“…In addition, for some rare and difficult musical notes, WeSinger 2 also demonstrates a much better performance than WeSinger [10], which can be reflected in the lower F0 RMSE value. Besides, we found that most of the generated singing audios by WeSinger 2 do not have metallic or inconsistent artifacts as found in [30] and do not sound unnatural. Apart from the performance, around 3 times faster than real-time synthesis can be achieved on a single core of a moderate CPU @ 2.60 GHz in Python environment.…”
Section: Evaluationsmentioning
confidence: 64%
“…The height of each randomly selected region represents the frequency band and the width represents the number of frames. We fix the values of height to be [20,30,40,50] and the values of width to be [190,160,70,30], which can effectively capture different resolutions in both the time domain and frequency domain. The location of each rectangle region is randomly sampled at every training step to achieve the effect of data augmentation.…”
Section: Acoustic Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…Thus, we also try our method in such a scenario and validate its effectiveness. Specif- ically, we use the first 30 singing voices (around 5s per voice) from the Opencpop dataset [74], embed watermarks in them, and then use these voices to train so-vits-svc according to the instruction document. After that, we use the trained model to perform voice conversion on a song.…”
Section: Real-world Voice Conversion Servicementioning
confidence: 99%
“…The original datasets mainly include the Opencrop dataset [9] released by Yu W et al, the OpenSinger [10] dataset released by Huang R et al, and the PopCs [11] dataset released by Liu J et al The detailed introduction of these datasets is as TABLE 1.…”
Section: Datasetmentioning
confidence: 99%