In this paper we present a scalable 3D video framework for capturing and rendering dynamic scenes. The acquisition system is based on multiple sparsely placed 3D video bricks, each comprising a projector, two grayscale cameras, and a color camera. Relying on structured light with complementary patterns, texture images and pattern-augmented views of the scene are acquired simultaneously by time-multiplexed projections and synchronized camera exposures. Using space-time stereo on the acquired pattern images, high-quality depth maps are extracted, whose corresponding surface samples are merged into a view-independent, point-based 3D data structure. This representation allows for effective photo-consistency enforcement and outlier removal, leading to a significant decrease of visual artifacts and a high resulting rendering quality using EWA volume splatting. Our framework and its view-independent representation allow for simple and straightforward editing of 3D video. In order to demonstrate its flexibility, we show compositing techniques and spatiotemporal effects.
We present a novel representation and rendering method for free-viewpoint video of human characters based on multiple input video streams. The basic idea is to approximate the articulated 3D shape of the human body using a subdivision into textured billboards along the skeleton structure. Billboards are clustered to fans such that each skeleton bone contains one billboard per source camera. We call this representation articulated billboards. In the paper we describe a semi-automatic, data-driven algorithm to construct and render this representation, which robustly handles even challenging acquisition scenarios characterized by sparse camera positioning, inaccurate camera calibration, low video resolution, or occlusions in the scene. First, for each input view, a 2D pose estimation based on image silhouettes, motion capture data, and temporal video coherence is used to create a segmentation mask for each body part. Then, from the 2D poses and the segmentation, the actual articulated billboard model is constructed by a 3D joint optimization and compensation for camera calibration errors. The rendering method includes a novel way of blending the textural contributions of each billboard and features an adaptive seam correction to eliminate visible discontinuities between adjacent billboards textures. Our articulated billboards do not only minimize ghosting artifacts known from conventional billboard rendering, but also alleviate restrictions to the setup and sensitivities to errors of more complex 3D representations and multiview reconstruction techniques. Our results demonstrate the flexibility and the robustness of our approach with high quality free-viewpoint video generated from broadcast footage of challenging, uncontrolled environments.
We present 3D video fragments, a dynamic point sample framework for real-time free-viewpoint video. By generalizing 2D video pixels towards 3D irregular point samples we combine the simplicity of conventional 2D video processing with the power of more complex polygonal representations for free-viewpoint video. We propose a differential update scheme exploiting the spatio-temporal coherence of the video streams of multiple cameras. Updates are issued by operators such as inserts and deletes accounting for changes in the input video images. The operators from multiple cameras are processed, merged into a 3D video stream and transmitted to a remote site. We also introduce a novel concept for camera control which dynamically selects the set of relevant cameras for reconstruction. Moreover, it adapts to the processing load and rendering platform. Our framework is generic in the sense that it works with any real-time 3D reconstruction method which extracts depth from images. The video renderer displays free-viewpoint videos using an efficient point based splatting scheme and makes use of state-of-the-art vertex and pixel processing hardware for real-time visual processing.
We present the 3D Video Recorder, a system capable of recording, processing, and playing three-dimensional video from multiple points of view. We first record 2D video streams from several synchronized digital video cameras and store pre-processed images to disk. An off-line processing stage converts these images into a time-varying threedimensional hierarchical point-based data structure and stores this 3D video to disk. We show how we can trade-off 3D video quality with processing performance and devise efficient compression and coding schemes for our novel 3D video representation. A typical sequence is encoded at less than 7 megabit per second at a frame rate of 8.5 frames per second. The 3D video player decodes and renders 3D videos from hard-disk in real-time, providing interaction features known from common video cassette recorders, like variable-speed forward and reverse, and slow motion. 3D video playback can be enhanced with novel 3D video effects such as freeze-and-rotate and arbitrary scaling. The player builds upon point-based rendering techniques and is thus capable of rendering high-quality images in real-time. Finally, we demonstrate the 3D Video Recorder on multiple real-life video sequences.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.