2022
DOI: 10.48550/arxiv.2208.02485
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

RAZE: Region Guided Self-Supervised Gaze Representation Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 73 publications
0
3
0
Order By: Relevance
“…In the pretext task, the model learns generalizable feature representations of the data distribution using labeled data, while in the downstream task, the model transfers its pretext knowledge to a different task with less labeled data. For example, Dubey et al ( 2022 ) had a pretext task of using the relative pupil positions in estimating the gaze direction, i.e., right, left, or center, which was then used for a downstream task of visual Attention Monitoring. In the Contrastive Learning SSL approach, the model is trained to identify similar, i.e., positive, and dissimilar, i.e., negative, pairs of data points; this helps the model to encode the data into a representation space where similar data points are close and dissimilar data points are far apart (Chen et al, 2020 ).…”
Section: Discussionmentioning
confidence: 99%
“…In the pretext task, the model learns generalizable feature representations of the data distribution using labeled data, while in the downstream task, the model transfers its pretext knowledge to a different task with less labeled data. For example, Dubey et al ( 2022 ) had a pretext task of using the relative pupil positions in estimating the gaze direction, i.e., right, left, or center, which was then used for a downstream task of visual Attention Monitoring. In the Contrastive Learning SSL approach, the model is trained to identify similar, i.e., positive, and dissimilar, i.e., negative, pairs of data points; this helps the model to encode the data into a representation space where similar data points are close and dissimilar data points are far apart (Chen et al, 2020 ).…”
Section: Discussionmentioning
confidence: 99%
“…The model contains three major parts: (1) a network based on ResNet blocks to extract the gaze representations from the input images and compute the representation difference, (2) an alignment sub-network to predict the motion parameters (translation and relative scale) between an input image and a target output, and (3) a trained encoder-decoder network to predict a warping field which warps the input using a grid sampling operation and synthesizes a gaze redirection output. Next, Dubey et al [ 47 ] proposed RAZE to learn gaze representation via auxiliary supervision to overcome the requirement of large scale annotated data, as shown in Figure 3 B. RAZE first performs pseudo labelling of the detected faces based on facial landmarks, then maps input image to the label space via a backbone network aka “Ize-Net”. Unfortunately, studies via unsupervised DL methods for detailed driver gaze analysis were not yet available, based on the extensive literature review.…”
Section: Driver Gaze Analysismentioning
confidence: 99%
“…In the pretext task, the model learns generalizable feature representations of the data distribution using labeled data, while in the downstream task, the model transfers its pretext knowledge to a different task with less labeled data. For example, Dubey et al (2022) had a pretext task of using the relative pupil positions in estimating the gaze direction, i.e., right, left, or center, which was then used for a downstream task of visual Attention Monitoring. In the Contrastive Learning SSL approach, the model is trained to identify similar, i.e., positive, and dissimilar, i.e., negative, pairs of data points; this helps the model to encode the data into a representation space where similar data points are close and dissimilar data points are far apart (Chen et al, 2020).…”
Section: Model Evaluationmentioning
confidence: 99%