NeuralHOFusion: Neural Volumetric Rendering under Human-object Interactions

Jiang, Yongli; Jiang, Suyi; Sun, Guoxing; Su, Zhuo; Guo, Kaiwen; Wu, Minye; Yu, Jingyi; Xu, Lu

doi:10.1109/cvpr52688.2022.00606

Cited by 30 publications

(15 citation statements)

References 52 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recent approaches begin to tackle modeling and synthesizing human interactions within 3D scenes, or with objects. Most of the researches focus on statically posing humans within the given 3D environment [16,24,69,71], by generating human scene interaction poses from various types of input including object semantics [17], images [21,23,64,65,68], and text descriptions [49,72].…”

Section: Related Workmentioning

confidence: 99%

Locomotion-Action-Manipulation: Synthesizing Human-Scene Interactions in Complex 3D Environments

Lee¹,

Joo²

2023

Preprint

View full text Add to dashboard Cite

show abstract

Section: Related Workmentioning

confidence: 99%

Locomotion-Action-Manipulation: Synthesizing Human-Scene Interactions in Complex 3D Environments

Lee¹,

Joo²

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Modeling dynamic scenes. Recent works have trained global object NeRF from monocular input [LNSW21, GSKH21], capture dynamic effects by overfitting to a global 4D space‐time volume [XHKK21, CJ23, FKMW*23], and explicitly capture human interactions [JJS*22, SGF*22]. Researchers have investigated the effect of segmentation, tracking, and NeRF modeling tasks in other efforts.…”

Section: Related Workmentioning

confidence: 99%

Factored Neural Representation for Scene Understanding

Wong

Mitra

2023

Computer Graphics Forum

View full text Add to dashboard Cite

A long‐standing goal in scene understanding is to obtain interpretable and editable representations that can be directly constructed from a raw monocular RGB‐D video, without requiring specialized hardware setup or priors. The problem is significantly more challenging in the presence of multiple moving and/or deforming objects. Traditional methods have approached the setup with a mix of simplifications, scene priors, pretrained templates, or known deformation models. The advent of neural representations, especially neural implicit representations and radiance fields, opens the possibility of end‐to‐end optimization to collectively capture geometry, appearance, and object motion. However, current approaches produce global scene encoding, assume multiview capture with limited or no motion in the scenes, and do not facilitate easy manipulation beyond novel view synthesis. In this work, we introduce a factored neural scene representation that can directly be learned from a monocular RGB‐D video to produce object‐level neural presentations with an explicit encoding of object movement (e.g., rigid trajectory) and/or deformations (e.g., nonrigid movement). We evaluate ours against a set of neural approaches on both synthetic and real data to demonstrate that the representation is efficient, interpretable, and editable (e.g., change object trajectory). Code and data are available at: http://geometry.cs.ucl.ac.uk/projects/2023/factorednerf/.

show abstract

“…Rendering can be fast [Esposito et al 2022;Li et al 2022a;Lin et al 2022;Reiser et al 2023] to even work on mobile devices [Cao et al 2023]. While real-time reconstruction is significantly more difficult, careful optimization and camera parameter refinement permits fast capture and view synthesis [Clark 2022;Haitz et al 2023;Jiang et al 2023;Müller et al 2022b;Rosinol et al 2022]. Other approaches demonstrate their application on video data with dynamic content [Li et al 2022b[Li et al , 2023Song et al 2022].…”

Section: Related Workmentioning

confidence: 99%

LiveNVS: Neural View Synthesis on Live RGB-D Streams

Fink,

Rückert,

Franke

et al. 2023

SIGGRAPH Asia 2023 Conference Papers

View full text Add to dashboard Cite

NeuralHOFusion: Neural Volumetric Rendering under Human-object Interactions

Cited by 30 publications

References 52 publications

Locomotion-Action-Manipulation: Synthesizing Human-Scene Interactions in Complex 3D Environments

Locomotion-Action-Manipulation: Synthesizing Human-Scene Interactions in Complex 3D Environments

Factored Neural Representation for Scene Understanding

LiveNVS: Neural View Synthesis on Live RGB-D Streams

Contact Info

Product

Resources

About