Kalin Stefanov scite author profile

Kalin Stefanov

5Publications

57Citation Statements Received

82Citation Statements Given

How they've been cited

How they cite others

123

Affiliations

Monash University, University of Southern California, Creative Technologies (United States)

Publications

Order By: Most citations

Vision-based Active Speaker Detection in Multiparty Interaction

Stefanov¹,

Beskow²,

Salvi³

2017

View full text Add to dashboard Cite

This paper presents a supervised learning method for automatic visual detection of the active speaker in multiparty interactions. The presented detectors are built using a multimodal multiparty interaction dataset previously recorded with the purpose to explore patterns in the focus of visual attention of humans. Three different conditions are included: two humans involved in taskbased interaction with a robot; the same two humans involved in task-based interaction where the robot is replaced by a third human, and a free three-party human interaction. The paper also presents an evaluation of the active speaker detection method in a speaker dependent experiment showing that the method achieves good accuracy rates in a fairly unconstrained scenario using only image data as input. The main goal of the presented method is to provide real-time detection of the active speaker within a broader framework implemented on a robot and used to generate natural focus of visual attention behavior during multiparty human-robot interactions.

show abstract

Modeling of Human Visual Attention in Multiparty Open-World Dialogues

Stefanov

Salvi

Kontogiorgos

et al. 2019

J. Hum.-Robot Interact.

View full text Add to dashboard Cite

This study proposes, develops, and evaluates methods for modeling the eye-gaze direction and head orientation of a person in multiparty open-world dialogues, as a function of low-level communicative signals generated by his/hers interlocutors. These signals include speech activity, eye-gaze direction, and head orientation, all of which can be estimated in real time during the interaction. By utilizing these signals and novel data representations suitable for the task and context, the developed methods can generate plausible candidate gaze targets in real time. The methods are based on Feedforward Neural Networks and Long Short-Term Memory Networks. The proposed methods are developed using several hours of unrestricted interaction data and their performance is compared with a heuristic baseline method. The study offers an extensive evaluation of the proposed methods that investigates the contribution of different predictors to the accurate generation of candidate gaze targets. The results show that the methods can accurately generate candidate gaze targets when the person being modeled is in a listening state. However, when the person being modeled is in a speaking state, the proposed methods yield significantly lower performance.

show abstract

Multimodal Analysis and Estimation of Intimate Self-Disclosure

Soleymani¹,

Stefanov

Kang

et al. 2019

View full text Add to dashboard Cite

Multimodal Automatic Coding of Client Behavior in Motivational Interviewing

Tavabi

Stefanov

Zhang

et al. 2020

View full text Add to dashboard Cite

Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially Aware Language Acquisition

Stefanov

Beskow

Salvi

2020

IEEE Trans. Cogn. Dev. Syst.

View full text Add to dashboard Cite

This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. The proposed method is intended to complement the acoustic detection of the active speaker, thus improving the system robustness in noisy conditions. The method can detect an arbitrary number of possibly overlapping active speakers based exclusively on visual information about their face. Furthermore, the method does not rely on external annotations, thus complying with cognitive development. Instead, the method uses information from the auditory modality to support learning in the visual domain. This paper reports an extensive evaluation of the proposed method using a large multiperson face-to-face interaction dataset. The results show good performance in a speaker dependent setting. However, in a speaker independent setting the proposed method yields a significantly lower performance. We believe that the proposed method represents an essential component of any artificial cognitive system or robotic platform engaging in social interactions.Index Terms-active speaker detection and localization, language acquisition through development, transfer learning, cognitive systems and development

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kalin Stefanov

Vision-based Active Speaker Detection in Multiparty Interaction

Modeling of Human Visual Attention in Multiparty Open-World Dialogues

Multimodal Analysis and Estimation of Intimate Self-Disclosure

Multimodal Automatic Coding of Client Behavior in Motivational Interviewing

Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially Aware Language Acquisition

Contact Info

Product

Resources

About