2022 IEEE 65th International Midwest Symposium on Circuits and Systems (MWSCAS) 2022
DOI: 10.1109/mwscas54063.2022.9859533
|View full text |Cite
|
Sign up to set email alerts
|

Audio-visual Speaker Diarization: Improved Voice Activity Detection with CNN based Feature Extraction

Abstract: Speaker diarization is a task to identify "who spoke when". Moreover, nowadays, speakers' audio clips usually are accompanied by visual information. Thus, in the latest works, speaker diarization systems performance has been improved substantially by taking advantage of the visual information synchronized with audio clips in Audio-Visual (AV) content. This paper presents a deep learning architecture to implement an AV speaker diarization system emphasizing Voice Activity Detection (VAD). Traditional AV speaker… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
references
References 16 publications
0
0
0
Order By: Relevance