2024
DOI: 10.1007/s11042-024-18457-9
|View full text |Cite
|
Sign up to set email alerts
|

AS-Net: active speaker detection using deep audio-visual attention

Abduljalil Radman,
Jorma Laaksonen

Abstract: Active Speaker Detection (ASD) aims at identifying the active speaker among multiple speakers in a video scene. Previous ASD models often seek audio and visual features from long video clips with a complex 3D Convolutional Neural Network (CNN) architecture. However, models based on 3D CNNs can generate discriminative spatial-temporal features, but this comes at the expense of computational complexity, and they frequently face challenges in detecting active speakers in short video clips. This work proposes the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
references
References 41 publications
0
0
0
Order By: Relevance