2022
DOI: 10.1007/978-3-031-19836-6_18
|View full text |Cite
|
Sign up to set email alerts
|

VoViT: Low Latency Graph-Based Audio-Visual Voice Separation Transformer

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 36 publications
0
4
0
Order By: Relevance
“…Dataset. We use Acappella [10], the only publicly available AV singing voice dataset. It consists of around 46-hours of solosinging videos spanning four language categories.…”
Section: Lip Synchronisation In Singing Voicementioning
confidence: 99%
See 3 more Smart Citations
“…Dataset. We use Acappella [10], the only publicly available AV singing voice dataset. It consists of around 46-hours of solosinging videos spanning four language categories.…”
Section: Lip Synchronisation In Singing Voicementioning
confidence: 99%
“…Training. We use the same training pipeline that is used for training the model Y-Net-mr in [10]. Y-Net-mr is a U-Net [23] conditioned by visual features extracted from cropped mouth frames using an 18-layer mixed convolution network [22].…”
Section: Singing Voice Separationmentioning
confidence: 99%
See 2 more Smart Citations