Person Localization Model Based on a Fusion of Acoustic and Visual Inputs

Koren, Leon; Stipančić, Tomislav; Ričko, Andrija; Orsag, Luka

doi:10.3390/electronics11030440

Cited by 6 publications

(2 citation statements)

References 31 publications

(30 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Module Input picture (1) grabs an image from the video stream and finds all faces on it. Module chooses a face based on the algorithm described in Koren et.al [20], crops it and resizes it to a predetermined size. CNN facial expression extraction (2) module takes previously created images and with the use of the efficient residual neural network (ENet) [21] extracts seven standard expressions in the form of an array.…”

Section: Frameworkmentioning

confidence: 99%

Generating non-verbal responses in virtual agent with use of LSTM network

Koren,

Stipancic

2024

Preprint

View full text Add to dashboard Cite

This paper investigates nonverbal communication in human interactions, with a specific focus on facial expressions. Employing a Long Short-Term Memory (LSTM) architecture and a custom-ized facial expression framework, our approach aims to improve virtual agent interactions by incorporating subtle nonverbal cues. The paper contributes to the emerging field of facial expres-sion generation, addressing gaps in current research and presenting a novel framework within Unreal Engine 5. The model's architecture, trained on the CANDOR corpus, captures temporal dynamics, and refines hyperparameters for optimal performance. During testing, the trained model showed a cosine similarity of -0.95. This enables the algorithm to accurately respond to non-verbal cues and interact with humans in a way that is comparable to human-human interac-tion. Unlike other approaches in the field of facial expression generation, the presented method is more comprehensive and enables the integration of a multi-modal approach for generating facial expressions. Future work involves integrating blendshape generation, real-world testing, and the inclusion of additional modalities to create a comprehensive framework for seamless hu-man-agent interactions beyond facial expressions.

show abstract

Section: Frameworkmentioning

confidence: 99%

Generating non-verbal responses in virtual agent with use of LSTM network

Koren,

Stipancic

2024

Preprint

View full text Add to dashboard Cite

show abstract

“…These sensors are used as a part of sensing modalities to analyze different information spaces including vision, sound, touch, etc. Based on the number of used sensing modalities these inputs are then fused in a multimodal approach [5].…”

Section: Introductionmentioning

confidence: 99%

Context-Driven Method in Realization of Optimized Human-Robot Interaction

Koren

Stipančić

Ričko

et al. 2022

Teh. glas. (Online)

Self Cite

View full text Add to dashboard Cite

Perceptual uncertainty and environmental volatility are among the most enduring challenges in robotic research today. Contemporary robotic systems are usually designed to work in specific and controlled domains where a total number of variables is defined. Traditional solutions therefore often result in over-constrained interaction spaces or rigid system architectures where any unexpected change can result in system failure. The focus of this work is set on achieving a constant adaptation of the system to changes through interaction. A computational mechanism based on the entropy reduction method is integrated along with the three-component control model. This model is seen as a context-to-data interpreter used to provide context-aware reasoning to the technical system. The mechanism is using a decrease in interaction uncertainties when proofs are provided to the system. In this way, the robot can choose the right interaction strategy that resolves reasoning ambiguities most efficiently

show abstract