ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414043
|View full text |Cite
|
Sign up to set email alerts
|

Litesing: Towards Fast, Lightweight and Expressive Singing Voice Synthesis

Abstract: LiteSing proposed in this paper is a high-quality singing voice synthesis (SVS) system, which is fast, lightweight and expressive. This model mainly stacks several non-autoregressive WaveNet blocks in the encoder and decoder under a generative adversarial architecture, predicts full conditions from the musical score, and generates acoustic features from these conditions. The full conditions in this paper consist of dynamic spectrogram energy, voiced/unvoiced (V/UV) decision and dynamic pitch curve, which are p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(11 citation statements)
references
References 9 publications
0
11
0
Order By: Relevance
“…To qualitatively examine the controllability of the proposed methods, we tried various style modifications by manipulating the initial LST sequence and f0 contour 2 .…”
Section: Qualitative Analysismentioning
confidence: 99%
See 2 more Smart Citations
“…To qualitatively examine the controllability of the proposed methods, we tried various style modifications by manipulating the initial LST sequence and f0 contour 2 .…”
Section: Qualitative Analysismentioning
confidence: 99%
“…Recently, interest in research on the SVS system that can reflect musical expression is increasing. A method of explicitly modeling information such as pitch curves, energy, V/UV., which can be extracted directly from the vocal signal, was proposed in [2]. [3] proposed a method to interpret the music score more naturally by introducing a module that predicts the difference between the actual singing and the score.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Singing voice synthesis (SVS) systems [1]- [7] generate singing voices from musical scores which contain music information such as lyrics, tempo, pitch, etc. SVS is similar to the text-to-speech (TTS) task [8]- [13] in terms of generating speech from text.…”
Section: Introductionmentioning
confidence: 99%
“…For example, to predict pitch feature better, Yi et al [18] utilized deep autoregressive network to capture the dependencies among the consecutive acoustic features. Zhuang et al [1] separated the pitch feature from the acoustic feature to avoid the interdependence between these pitch features and the timbre features. Ren et al [9] introduced the pitch and energy information into the speech generation task and presented variance adaptors to make the generated audio expressive.…”
Section: Introductionmentioning
confidence: 99%