ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053811
|View full text |Cite
|
Sign up to set email alerts
|

Fast and High-Quality Singing Voice Synthesis System Based on Convolutional Neural Networks

Abstract: The present paper describes singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of synthesized singing voices. As singing voices represent a rich form of expression, a powerful technique to model them accurately is required. In the proposed technique, long-term dependencies of singing voices are modeled by CNNs. An acoustic feature sequence is generated for each… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 17 publications
(14 citation statements)
references
References 26 publications
0
14
0
Order By: Relevance
“…Although the parameter generation algorithm can generate a smooth acoustic feature sequence, the computational cost at the synthesis stage increases. A recent study [11] introduced a different approach that considers dynamic features only during training. In this approach, the objective function that considers the dynamic features can be written as…”
Section: B Acoustic Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…Although the parameter generation algorithm can generate a smooth acoustic feature sequence, the computational cost at the synthesis stage increases. A recent study [11] introduced a different approach that considers dynamic features only during training. In this approach, the objective function that considers the dynamic features can be written as…”
Section: B Acoustic Modelmentioning
confidence: 99%
“…A note pitch transition greatly influences the F0 trajectory. Therefore, we add a skip connection between the input note pitch and a hidden layer of the acoustic model to deliver the note pitch inside the acoustic model, motivated by [11]. This helps transmit the note pitch information efficiently and predict the residual component between log F0 and the note pitch.…”
Section: A Pitch Normalizationmentioning
confidence: 99%
“…Non-Seq2Seq singing synthesizers include those based on autoregressive architectures [17,21,22], feed-forward CNN [23], and feed-forward GAN-based approaches [24,25].…”
Section: Relation To Prior Workmentioning
confidence: 99%
“…Firstly, the DNN model is proposed to predict the spectral information and begin to outperform conventional HMMs significantly [7,8]. Later on, variations of the neural networks, including Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN), also demonstrate their power on acoustic modeling [6,[9][10][11]. Other architectures, such as the generative adversarial network (GAN), are also shown to improve the synthesized singing quality [12][13][14][15][16][17].…”
Section: Introductionmentioning
confidence: 99%