2021
DOI: 10.48550/arxiv.2106.06426
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Catch-A-Waveform: Learning to Generate Audio from a Single Short Example

Abstract: Models for audio generation are typically trained on hours of recordings. Here, we illustrate that capturing the essence of an audio source is typically possible from as little as a few tens of seconds from a single training signal. Specifically, we present a GAN-based generative model that can be trained on one short audio signal from any domain (e.g. speech, music, etc.) and does not require pre-training or any other form of external supervision. Once trained, our model can generate random samples of arbitra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 31 publications
0
2
0
Order By: Relevance
“…In the audio domain, the inpainting methods try to recover missing data in the waveform, which can occur due to various reasons such as distortions and transmission errors (Adler et al, 2012;Marafioti et al, 2020). The same term has also been used to describe the bandwidth extension problem, where the missing high frequency content has to be estimated (inpainted) from the low frequencies (Greshler et al, 2021).…”
Section: Music Inpaintingmentioning
confidence: 99%
“…In the audio domain, the inpainting methods try to recover missing data in the waveform, which can occur due to various reasons such as distortions and transmission errors (Adler et al, 2012;Marafioti et al, 2020). The same term has also been used to describe the bandwidth extension problem, where the missing high frequency content has to be estimated (inpainted) from the low frequencies (Greshler et al, 2021).…”
Section: Music Inpaintingmentioning
confidence: 99%
“…Previous works have proposed models to controllably generate e.g. images [13,17,38,45,48,51,55,57,73,76,77], videos [6,12,25,37,42,46,64,65,65,71], and audios [1,9,15,22,24,47,62,63], or separate sounds [18,19,79,80,84]. However, most of the audio works are music-related, and only a few attempts have been made to generate visually guided audio in an open domain setup [11,83].…”
mentioning
confidence: 99%