2021
DOI: 10.48550/arxiv.2111.14448
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

AVA-AVD: Audio-Visual Speaker Diarization in the Wild

Eric Zhongcong Xu,
Zeyang Song,
Satoshi Tsutsui
et al.

Abstract: Audio-visual speaker diarization aims at detecting "who spoken when" using both auditory and visual signals. Existing audio-visual diarization datasets are mainly focused on indoor environments like meeting rooms or news studios, which are quite different from in-the-wild videos in many scenarios such as movies, documentaries, and audience sitcoms. To create a testbed that can effectively compare diarization methods on videos in the wild, we annotate the speaker diarization labels on the AVA movie dataset and … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 44 publications
0
3
0
Order By: Relevance
“…First, the shot boundaries in the video are detected. 1 The shots, in practice, correspond to sequences of frames that look similar concerning some similarity measures. After that, the frames in each shot are grouped in face tracks and feature vectors are extracted from them.…”
Section: B Face Tracking and Recognitionmentioning
confidence: 99%
See 2 more Smart Citations
“…First, the shot boundaries in the video are detected. 1 The shots, in practice, correspond to sequences of frames that look similar concerning some similarity measures. After that, the frames in each shot are grouped in face tracks and feature vectors are extracted from them.…”
Section: B Face Tracking and Recognitionmentioning
confidence: 99%
“…Finally, for every cluster, an identity label is assigned for verifying the active speaker in the next module of our pipeline [21]. 1 We relied on PySceneDetect framework for the implementation of shot detection (https://github.com/Breakthrough/PySceneDetect).…”
Section: B Face Tracking and Recognitionmentioning
confidence: 99%
See 1 more Smart Citation