2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.02033
|View full text |Cite
|
Sign up to set email alerts
|

Egocentric Prediction of Action Target in 3D

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(3 citation statements)
references
References 44 publications
0
3
0
Order By: Relevance
“…By mapping actions to specific regions within a scene, this technique enables the understanding and prediction of human activities in a given environment. Li et al (2022) focused on anticipating as early as possible the target location of a person's object manipulation action in a 3D workspace. While this is a special case of trajectory forecasting, the latter is infeasible in manipulation scenarios and the hands often are located outside the field of view.…”
Section: D Scene Understandingmentioning
confidence: 99%
See 1 more Smart Citation
“…By mapping actions to specific regions within a scene, this technique enables the understanding and prediction of human activities in a given environment. Li et al (2022) focused on anticipating as early as possible the target location of a person's object manipulation action in a 3D workspace. While this is a special case of trajectory forecasting, the latter is infeasible in manipulation scenarios and the hands often are located outside the field of view.…”
Section: D Scene Understandingmentioning
confidence: 99%
“…It comprises more than 500K synchronised RGBD frames and gravity directions captured from an egocentric viewpoint with diverse daily activities, for a total of 16 h RGBD recording. EgoPAT3D (Li et al, 2022) is a large multimodality dataset of more than a mil-lion frames of RGB-D and IMU streams, which has been designed for the task of anticipating the target location of a person's object manipulation action in a 3D workspace. The total collection contains 150 recordings, 15 household scene point clouds, 15,000 hand-object actions, 600 min of raw RGB-D/IMU data, 0.9 million hand-object action frames, and 1 million RGB-D frames for the entire dataset.…”
Section: D Scene Understandingmentioning
confidence: 99%
“…Recent advances in egocentric action recognition, anticipation, and retrieval focus on building powerful clip-based video models that operate on video clips of a few seconds at a time [12,16,18,25,43,44,54,55]. Despite encouraging performance, these models typically process denselysampled frames with temporally-aware operations, making them computationally heavy.…”
Section: Introductionmentioning
confidence: 99%