2021 IEEE Winter Conference on Applications of Computer Vision (WACV) 2021
DOI: 10.1109/wacv48630.2021.00111
|View full text |Cite
|
Sign up to set email alerts
|

Integrating Human Gaze into Attention for Egocentric Activity Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
14
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 28 publications
(15 citation statements)
references
References 20 publications
1
14
0
Order By: Relevance
“…Moreover, in Section 5.5, blind participants suggested enabling future smart glasses to detect specific moments (e.g., sitting in a table or talking with someone) to control feedback delivery. We believe that egocentric activity recognition [59,65,107], a widely-studied problem in the computer vision community, can help realize this experience.…”
Section: Implications Regarding Teachable Interfacesmentioning
confidence: 96%
See 1 more Smart Citation
“…Moreover, in Section 5.5, blind participants suggested enabling future smart glasses to detect specific moments (e.g., sitting in a table or talking with someone) to control feedback delivery. We believe that egocentric activity recognition [59,65,107], a widely-studied problem in the computer vision community, can help realize this experience.…”
Section: Implications Regarding Teachable Interfacesmentioning
confidence: 96%
“…For instance, one recent work proposed an egocentric activity recognition model assuming that input data inherently contains the user's gaze motion [65]. However, that model may not work on data collected by blind people since the assumption, (i.e., eye gaze), only applies to a certain population (i.e., sighted people).…”
Section: Implications Regarding Teachable Interfacesmentioning
confidence: 99%
“…This leads to a lack of available datasets with cognitively demanding tasks, which could potentially benefit from supplementary gaze information. Consequently, the use of MET systems for EAR has been severely limited and researchers have either explored gaze prediction and modeling (Fathi et al 2012;Huang et al 2020;Li et al 2021;Min and Corso 2021) or utilized non-contextual gaze metrics such as fixation and saccade information for action recognition (Kit and Sullivan 2016;Li et al 2015;Liao et al 2018). Fathi et al (2012) presented a generative probabilistic model that combines gaze prediction and object-based features as multi-modal input for action recognition during daily actions.…”
Section: Gaze-based Action Recognitionmentioning
confidence: 99%
“…We also follow [13] and report results for two subsets within val/test: unseen participants and tail classes. For EGTEA, we follow [30,45,47] and report top-1 accuracy and mean class accuracy using the first train/test split.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…First, as there is no audio input to the transformer, we do not use modality encodings either. Second, following previous methods [27,40,45,47,58,59] that train using a single head for actions and report only action accuracy, we use a single summary embedding for actions, rather than verb/noun embeddings. Accordingly, the language model utilises a single word-embedding for actions, with a dimension of 512.…”
Section: F Egtea Implementation Detailsmentioning
confidence: 99%