2022
DOI: 10.3390/electronics11030440
|View full text |Cite
|
Sign up to set email alerts
|

Person Localization Model Based on a Fusion of Acoustic and Visual Inputs

Abstract: PLEA is an interactive, biomimetic robotic head with non-verbal communication capabilities. PLEA reasoning is based on a multimodal approach combining video and audio inputs to determine the current emotional state of a person. PLEA expresses emotions using facial expressions generated in real-time, which are projected onto a 3D face surface. In this paper, a more sophisticated computation mechanism is developed and evaluated. The model for audio-visual person separation can locate a talking person in a crowde… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 31 publications
(30 reference statements)
0
2
0
Order By: Relevance
“…Module Input picture (1) grabs an image from the video stream and finds all faces on it. Module chooses a face based on the algorithm described in Koren et.al [20], crops it and resizes it to a predetermined size. CNN facial expression extraction (2) module takes previously created images and with the use of the efficient residual neural network (ENet) [21] extracts seven standard expressions in the form of an array.…”
Section: Frameworkmentioning
confidence: 99%
“…Module Input picture (1) grabs an image from the video stream and finds all faces on it. Module chooses a face based on the algorithm described in Koren et.al [20], crops it and resizes it to a predetermined size. CNN facial expression extraction (2) module takes previously created images and with the use of the efficient residual neural network (ENet) [21] extracts seven standard expressions in the form of an array.…”
Section: Frameworkmentioning
confidence: 99%
“…These sensors are used as a part of sensing modalities to analyze different information spaces including vision, sound, touch, etc. Based on the number of used sensing modalities these inputs are then fused in a multimodal approach [5].…”
Section: Introductionmentioning
confidence: 99%