ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9054473
|View full text |Cite
|
Sign up to set email alerts
|

Speech-To-Singing Conversion in an Encoder-Decoder Framework

Abstract: In this paper our goal is to convert a set of spoken lines into sung ones. Unlike previous signal processing based methods, we take a learning based approach to the problem. This allows us to automatically model various aspects of this transformation, thus overcoming dependence on specific inputs such as high quality singing templates or phoneme-score synchronization information. Specifically, we propose an encoder-decoder framework for our task. Given timefrequency representations of speech and a target melod… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
23
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 12 publications
(23 citation statements)
references
References 28 publications
0
23
0
Order By: Relevance
“…al [23] proposed voice conversion approach to transform speech to singing style. An encoder-decoder approach to model-based learning of speech to singing voice conversion was explored in Parekh [24]. However many of the past methods are not scalable to large volumes of speech data needed for acoustic model training in a E2E lyrics transcription system.…”
Section: Related Prior Workmentioning
confidence: 99%
“…al [23] proposed voice conversion approach to transform speech to singing style. An encoder-decoder approach to model-based learning of speech to singing voice conversion was explored in Parekh [24]. However many of the past methods are not scalable to large volumes of speech data needed for acoustic model training in a E2E lyrics transcription system.…”
Section: Related Prior Workmentioning
confidence: 99%
“…Speech-to-sing (STS) can be regarded as an example of the general problem of style transfer. Compared with other style transfer work, the "style" of STS not only refers to the incorporation of the required melody, but also requires maintenance of the speaker's identity while transferring the timbre of speech to that of singing [274].…”
Section: Speech-to-sing(sts)mentioning
confidence: 99%
“…The algorithm was evaluated on the task of SVC, and results showed that the algorithm can produce high-quality singing voice which is highly similar to the target speaker's voice given the normal speech samples of the target speaker. Parekh et al [274] explored a method to achieve STS conversion by employing the minimal additional information over the melody contour.…”
Section: Speech-to-sing(sts)mentioning
confidence: 99%
See 1 more Smart Citation
“…Following the convention in the literature, we define unconditional generation as a task that aims at generating things from scratch, i.e., taking nothing but random noises as the input. In contrast, a conditional generation model takes additional input such as class labels[8], text[9], pitch labels[10,11], or reference audio[12,13,14].…”
mentioning
confidence: 99%