2018
DOI: 10.1109/tpami.2017.2782819
|View full text |Cite
|
Sign up to set email alerts
|

Tracking Gaze and Visual Focus of Attention of People Involved in Social Interaction

Abstract: The visual focus of attention (VFOA) has been recognized as a prominent conversational cue. We are interested in estimating and tracking the VFOAs associated with multi-party social interactions. We note that in this type of situations the participants either look at each other or at an object of interest; therefore their eyes are not always visible. Consequently both gaze and VFOA estimation cannot be based on eye detection and tracking. We propose a method that exploits the correlation between eye gaze and h… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
37
0
1

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 72 publications
(38 citation statements)
references
References 39 publications
0
37
0
1
Order By: Relevance
“…Depending on the available groundtruth annotations, we measure AP at frame level, considering each pair as an independent sample, or at shot-level, if more detailed annotations are not available. Frame level is used for UCO-LAEO and AVA-LAEO and, following previous work [16,18], shot level for TVHID.…”
Section: Evaluation Protocols and Scoring Methodologymentioning
confidence: 99%
See 1 more Smart Citation
“…Depending on the available groundtruth annotations, we measure AP at frame level, considering each pair as an independent sample, or at shot-level, if more detailed annotations are not available. Frame level is used for UCO-LAEO and AVA-LAEO and, following previous work [16,18], shot level for TVHID.…”
Section: Evaluation Protocols and Scoring Methodologymentioning
confidence: 99%
“…This problem is addressed in [23] with a deep learning model that reasons about human gaze and 3D geometrical relationships between different views of the same scene. The authors of [18] consider scenarios where multiple people are involved in a social interaction. Given that the eyes of a person are not always visible (e.g.…”
Section: Related Workmentioning
confidence: 99%
“…Both the training and testing data involved 64 2D+t sequences; herein, the training size is 25, while the testing size is 39. The data had varying spatial (0.27-0.77 mm) and temporal resolution (11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30). The training data were annotated by CLUST as ground truth (center of blood vessels) of fiducial features throughout the acquisition sequence.…”
Section: A Liver Ultrasound Data and Attention-aware Video Generationmentioning
confidence: 99%
“…Yun et al (Yun et al, 2012) evaluates two-person interaction based on a wide variety of geometric body features, such as joint keypoints and distances, and joint-to-plane distances. Massé et al (Massé et al, 2017) propose a framework where correlation between head pose and eye gaze is used to estimate the VFOA. The authors of some of these works also address the importance of these features to estimate attention in the field of HCI.…”
Section: Vison-based Attention Estimationmentioning
confidence: 99%