2021
DOI: 10.48550/arxiv.2104.02901
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations

Abstract: Any-to-any voice conversion (VC) aims to convert the timbre of utterances from and to any speakers seen or unseen during training. Various any-to-any VC approaches have been proposed like AUTOVC, AdaINVC, and FragmentVC. AUTOVC, and AdaINVC utilize source and target encoders to disentangle the content and speaker information of the features. Frag-mentVC utilizes two encoders to encode source and target information and adopts cross attention to align the source and target features with similar phonetic content.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 22 publications
0
6
0
Order By: Relevance
“…The Summit supercomputer was utilized to train the protein LMs (pLMs), which required 5616 GPUs and a TPU Pod with up to 1024 cores. [102] performed numerical simulations, exploring the added value of three SSL models, notably (I) autoregressive predictive coding (APC), (II) contrastive predictive coding (CPC), and (III) wav2vec 2.0, in performing flexible classification and reliable recognition of datasets engaged in auto-regressive language modeling. Several any-to-any voice conversion (VC) methods have been proposed, like AUTOVC, AdaINVC, and FragmentVC.…”
Section: Auto-regressive Language Modelingmentioning
confidence: 99%
“…The Summit supercomputer was utilized to train the protein LMs (pLMs), which required 5616 GPUs and a TPU Pod with up to 1024 cores. [102] performed numerical simulations, exploring the added value of three SSL models, notably (I) autoregressive predictive coding (APC), (II) contrastive predictive coding (CPC), and (III) wav2vec 2.0, in performing flexible classification and reliable recognition of datasets engaged in auto-regressive language modeling. Several any-to-any voice conversion (VC) methods have been proposed, like AUTOVC, AdaINVC, and FragmentVC.…”
Section: Auto-regressive Language Modelingmentioning
confidence: 99%
“…In [29] the local and global style information are considered simultaneously. Selfsupervised speech representations are employed in [18] and [19] for voice conversion. Wang et al [16] used mutual information to measure the dependencies between speech representations.…”
Section: B Feature Disentanglement Based Voice Conversionmentioning
confidence: 99%
“…This reconstruction loss encourages well defined output spectrograms and ensures that the auto-encoder architecture does not loose too much information. It is also an essential part and a main objective for feature disentanglement-based any-to-any voice conversion methods [10], [11], [19].…”
Section: Loss Functionmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, FragmentVC applies the wav2vec 2.0 as the input of content encoders [13]. In S2VC [14], CPC [15] is used as the input of both the content encoder and the speaker encoder.…”
Section: Introductionmentioning
confidence: 99%