GIMO: Gaze-Informed Human Motion Prediction in Context

Zheng, Yang; Yang, Y.B.; Mo, Kaichun; Li, Jiaman; Chen, Yu; Liu, Yebin; Liu, C. Karen; Guibas, Leonidas J.

doi:10.1007/978-3-031-19778-9_39

Cited by 32 publications

(9 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…At the same time, the work presented in Li et al (2023) achieves the best performance on a set of egocentric datasets (Luo et al, 2021;Zheng et al, 2022) captured from outward-looking camera perspective, including their proposed synthetic egocentric dataset. Given that directly matching egocentric video with full-body pose is challenging due to the frequent absence of visible body parts, the authors address the task by introducing an intermediate step of head motion estimation.…”

Section: State-of-the-art Papersmentioning

confidence: 98%

An Outlook into the Future of Egocentric Vision

Plizzari,

Goletto,

Furnari

et al. 2024

Int J Comput Vis

View full text Add to dashboard Cite

What will the future be? We wonder! In this survey, we explore the gap between current research in egocentric vision and the ever-anticipated future, where wearable computing, with outward facing cameras and digital overlays, is expected to be integrated in our every day lives. To understand this gap, the article starts by envisaging the future through character-based stories, showcasing through examples the limitations of current technology. We then provide a mapping between this future and previously defined research tasks. For each task, we survey its seminal works, current state-of-the-art methodologies and available datasets, then reflect on shortcomings that limit its applicability to future research. Note that this survey focuses on software models for egocentric vision, independent of any specific hardware. The paper concludes with recommendations for areas of immediate explorations so as to unlock our path to the future always-on, personalised and life-enhancing egocentric vision.

show abstract

Section: State-of-the-art Papersmentioning

confidence: 98%

An Outlook into the Future of Egocentric Vision

Plizzari,

Goletto,

Furnari

et al. 2024

Int J Comput Vis

View full text Add to dashboard Cite

show abstract

“…We use the GIMO dataset [Zheng et al 2022] as another test dataset for evaluating the generalization ability of the proposed method on out-of-distribution data.…”

Section: Datasetsmentioning

confidence: 99%

“…For each sequence, we compute the mean of the noncollision scores for all the objects in the scene. In Table 2, we compare the mean non-collision scores on the smoothed PROXD dataset [Zhang et al 2021], which was used during training, and the unseen GIMO dataset [Zheng et al 2022], which also provides SMPL-X parameters for humans interacting with scenes.…”

Section: Contact Object Recoverymentioning

confidence: 99%

See 1 more Smart Citation

Scene Synthesis from Human Motion

Wang

et al. 2022

SIGGRAPH Asia 2022 Conference Papers

View full text Add to dashboard Cite

Large-scale capture of human motion with diverse, complex scenes, while immensely useful, is often considered prohibitively costly. Meanwhile, human motion alone contains rich information about the scene they reside in and interact with. For example, a sitting human suggests the existence of a chair, and their leg position further implies the chair's pose. In this paper, we propose to synthesize diverse, semantically reasonable, and physically plausible scenes based on human motion. Our framework, Scene Synthesis from HUMan MotiON (SUMMON), includes two steps. It first uses ContactFormer, our newly introduced contact predictor, to obtain temporally consistent contact labels from human motion. Based on these predictions, SUMMON then chooses interacting objects and optimizes physical plausibility losses; it further populates the scene with objects that do not interact with humans. Experimental results demonstrate that SUMMON synthesizes feasible, plausible, and diverse scenes and has the potential to generate extensive human-scene interaction data for the community. * and † indicate equal contribution. https://sites.google.com/stanford.edu/summon

show abstract

“…Then, the output viewport embedding 𝑓 𝑚 − 𝑠 is expected to be aware of the 3D video, which results in the final viewport embedding 𝑓 𝑚 − 𝑔 . Inspired by [35], we handle the gaze embedding in a bidirectional manner, i.e., the viewport embedding 𝑓 𝑚 is also utilized as the query to update the gaze features into 𝑓 𝑔 − 𝑚 . The bidirectionally fused multi-modal features are then assembled into holistic temporal input representations to perform human viewport prediction.…”

Section: Designmentioning

confidence: 99%

The Shenzhen-Hong Kong Dialectics

Hu¹

2020

The Shenzhen Phenomenon

View full text Add to dashboard Cite

Recent years have witnessed a rapid development of immersive multimedia which bridges the gap between the real world and virtual space. Volumetric videos, as an emerging representative 3D video paradigm that empowers extended reality, stand out to provide unprecedented immersive and interactive video watching experience. Despite the tremendous potential, the research towards 3D volumetric video is still in its infancy, relying on sufficient and complete datasets for further exploration. However, existing related volumetric video datasets mostly only include a single object, lacking details about the scene and the interaction between them. In this paper, we focus on the current most widely used data format, point cloud, and for the first time release a fullscene volumetric video dataset that includes multiple people and their daily activities interacting with the external environments. Comprehensive dataset description and analysis are conducted, with potential usage of this dataset. The dataset and additional tools can be accessed via the following website: https://cuhkszinml.github.io/full_scene_volumetric_video_dataset/.

show abstract

GIMO: Gaze-Informed Human Motion Prediction in Context

Cited by 32 publications

References 53 publications

An Outlook into the Future of Egocentric Vision

An Outlook into the Future of Egocentric Vision

Scene Synthesis from Human Motion

The Shenzhen-Hong Kong Dialectics

Contact Info

Product

Resources

About