Neural Point-Based Graphics

Aliev, Kara-Ali; Sevastopolsky, Artem; Kolos, Maria; Ulyanov, Dmitry; Lempitsky, Victor

doi:10.48550/arxiv.1906.08240

Cited by 35 publications

(86 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The recent progress of differentiable neural rendering brings huge potential for 3D scene modeling and photorealistic novel view synthesis. Researchers explore various data representations to pursue better performance and characteristics, such as point-clouds [2,58,64], voxels [31], texture meshes [27,60] or implicit functions [7,33,34,36,43,63]. However, these methods…”

Section: Related Workmentioning

confidence: 99%

Neural Free-Viewpoint Performance Rendering under Complex Human-object Interactions

Sun

Chen

et al. 2021

Proceedings of the 29th ACM International Conference on Multimedia

View full text Add to dashboard Cite

human-object rendering scheme, which combines direction-aware neural blending weight learning and spatial-temporal texture completion to provide high-resolution and photo-realistic texture results in the occluded scenarios. Extensive experiments demonstrate the effectiveness of our approach to achieve high-quality geometry and texture reconstruction in free viewpoints for challenging human-object interactions. CCS CONCEPTS• Computing methodologies → Image-based rendering.

show abstract

Section: Related Workmentioning

confidence: 99%

Neural Free-Viewpoint Performance Rendering under Complex Human-object Interactions

Sun

Chen

et al. 2021

Proceedings of the 29th ACM International Conference on Multimedia

View full text Add to dashboard Cite

show abstract

“…However, their work does not target realtime animation or dynamics, and the usage of a heavy U-Net for rendering the final result is not possible in our setting. Aliev et al [2] proposes neural point-based graphics, in which the geometry is represented as a point cloud. Each point is associated with a deep feature, and a neural net computes pixel values based on splatted feature points.…”

Section: Neural Renderingmentioning

confidence: 99%

“…To solve this issue and scale the rendering to the number of persons in the VR telepresence, we should compute only the visible pixels, thus upper bounding the computation by the number of pixels of the display. Recent works in neural rendering such as the defferred neural rendering [24], the neural point-based graphics [2], the implicit differentiable rendering [27], use neural network to compute pixel values in the screen space instead of the texture space thus computing only visible pixels. However, in all these works, either a static scene is assumed, or the viewing distance and perspective are not expected to be entirely free in the 3D space.…”

Section: Introductionmentioning

confidence: 99%

“…[11]) that achieve high details using sinusoidal functions, but require increasing the dimensionality by 20×, with corresponding computational costs. Secondly, in contrast to other works such as [24,2,27], we do not employ convolutions in screen space, but instead apply a shallow MLP at each contributing pixel. This has the advantage of avoiding visual artifacts during motion and stereo inconsistencies, as well as challenges in generalizing to changes in scale, rotation and perspective, all of which are common in interactive immersive 3D media.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Pixel Codec Avatars

Ma¹,

Simon²,

Saragih³

et al. 2021

Preprint

View full text Add to dashboard Cite

Telecommunication with photorealistic avatars in virtual or augmented reality is a promising path for achieving authentic face-to-face communication in 3D over remote physical distances. In this work, we present the Pixel Codec Avatars (PiCA): a deep generative model of 3D human faces that achieves state of the art reconstruction performance while being computationally efficient and adaptive to the rendering conditions during execution. Our model combines two core ideas: (1) a fully convolutional architecture for decoding spatially varying features, and (2) a renderingadaptive per-pixel decoder. Both techniques are integrated via a dense surface representation that is learned in a weakly-supervised manner from low-topology mesh tracking over training images. We demonstrate that PiCA improves reconstruction over existing techniques across testing expressions and views on persons of different gender and skin tone. Importantly, we show that the PiCA model is much smaller than the state-of-art baseline model, and makes multi-person telecommunicaiton possible: on a single Oculus Quest 2 mobile VR headset, 5 avatars are rendered in realtime in the same scene.

show abstract

“…The 3D representations are learned from 2D images via differentiable rendering networks. Convolutional neural networks are used to predict volumetric representations via 3D voxel-grid features [40,25,31,27,16,17], point clouds [1,49], textured meshes [20,23,44] and multi-plane images [11,55]. The learnt representations are projected by a 3D-to-2D operation to synthesize images.…”

Section: Related Workmentioning

confidence: 99%

Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering

Kwon,

Kim,

Ceylan

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, we aim at synthesizing a free-viewpoint video of an arbitrary human performance using sparse multi-view cameras. Recently, several works have addressed this problem by learning person-specific neural radiance fields (NeRF) to capture the appearance of a particular human. In parallel, some work proposed to use pixel-aligned features to generalize radiance fields to arbitrary new scenes and objects. Adopting such generalization approaches to humans, however, is highly challenging due to the heavy occlusions and dynamic articulations of body parts. To tackle this, we propose Neural Human Performer, a novel approach that learns generalizable neural radiance fields based on a parametric human body model for robust performance capture. Specifically, we first introduce a temporal transformer that aggregates tracked visual features based on the skeletal body motion over time. Moreover, a multi-view transformer is proposed to perform cross-attention between the temporally-fused features and the pixel-aligned features at each time step to integrate observations on the fly from multiple views. Experiments on the ZJU-MoCap and AIST datasets show that our method significantly outperforms recent generalizable NeRF methods on unseen identities and poses. The video results and code are available at https://youngjoongunc.github.io/nhp.Preprint. Under review.

show abstract

Neural Point-Based Graphics

Abstract: as well as standard RGB cameras even in the presence of objects that are challenging for standard mesh-based modeling.

Cited by 35 publications

References 29 publications

Neural Free-Viewpoint Performance Rendering under Complex Human-object Interactions

Neural Free-Viewpoint Performance Rendering under Complex Human-object Interactions

Pixel Codec Avatars

Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering

Contact Info

Product

Resources

About