2022 Sensor Data Fusion: Trends, Solutions, Applications (SDF) 2022
DOI: 10.1109/sdf55338.2022.9931697
|View full text |Cite
|
Sign up to set email alerts
|

Audio-Visual Active Speaker Identification: A comparison of dense image-based features and sparse facial landmark-based features

Abstract: The field of speaker detection is relatively well researched. Multiple solutions focusing solely on audio or video, or a combination of both exist. On the audio side, a popular feature representation are mel-frequency cepstral coefficients, which are a sparse representation of the audio signal. On the video side, mostly pixel intensities are used, which is not sparse at all. In this paper, we take a look at a sparse video feature representation, namely facial landmarks. We first evaluate what selection of land… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 12 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?