Abstract-Free-viewpoint video conferencing allows a participant to observe the remote 3D scene from any freely chosen viewpoint. An intermediate virtual viewpoint image is commonly synthesized using two pairs of transmitted texture and depth maps from two neighboring captured viewpoints via depthimage-based rendering (DIBR). To maintain high quality of synthesized images, it is imperative to contain the adverse effects of network packet losses that may arise during texture and depth video transmission. Towards this end, we develop an integrated approach that exploits the representation redundancy inherent in the multiple streamed videos-a voxel in the 3D scene visible to two captured views is sampled and coded twice in the two views. In particular, at the receiver we first develop an error concealment strategy that adaptively blends corresponding pixels in the two captured views during DIBR, so that pixels from the more reliable transmitted view are weighted more heavily. We then couple it with a sender-side optimization of reference picture selection (RPS) during real-time video coding, so that blocks containing samples of voxels that are visible in both views are more error-resiliently coded in one view only, given adaptive blending will erase errors in the other view. Further, synthesized view distortion sensitivities to texture versus depth errors are analyzed, so that relative importance of texture and depth code blocks can be computed for system-wide RPS optimization. Experimental results show that the proposed scheme can outperform the use of a traditional feedback channel by up to 0.82 dB on average at 8% packet loss rate, and by as much as 3 dB for particular frames.Index Terms-Free viewpoint video conferencing, reference picture selection, error concealment, depth-image-based rendering.
In a free-viewpoint video conferencing system, the viewer can choose any desired viewpoint of the 3D scene for observation. Rendering of images for arbitrarily chosen viewpoint can be achieved through depth-image-based rendering (DIBR), which typically employs "texture-plus-depth" video format for 3D data exchange. Robust and timely transmission of multiple texture and depth maps over bandwidth-constrained and loss-prone networks is a challenging problem. In this paper, we optimize transmission of multiview video in texture-plus-depth format over a lossy channel for free viewpoint synthesis at decoder. In particular, we construct a recursive model to estimate the distortion in synthesized view due to errors in both texture and depth maps, and formulate a rate-distortion optimization problem to select reference pictures for macroblock encoding in H.264 in a computation-efficient way, in order to provide unequal protection to different macroblocks. Results show that the proposed scheme can outperform random insertion of intra refresh blocks by up to 0.73 dB at 5% loss.Index Terms-Depth-image-based rendering, video streaming
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.