Wavelet-Based Joint Estimation and Encoding of Depth-Image-Based Representations for Free-Viewpoint Rendering

IEEE Trans. on Image Process.

Ortega

2011

Abstract-The encoding of both texture and depth maps of multiview images, captured by a set of spatially correlated cameras, is important for any 3-D visual communication system based on depth-image-based rendering (DIBR). In this paper, we address the problem of efficient bit allocation among texture and depth maps of multiview images. More specifically, suppose we are given a coding tool to encode texture and depth maps at the encoder and a view-synthesis tool to construct intermediate views at the decoder using neighboring encoded texture and depth maps. Our goal is to determine how to best select captured views for encoding and distribute available bits among texture and depth maps of selected coded views, such that the visual distortion of desired constructed views is minimized. First, in order to obtain at the encoder a low complexity estimate of the visual quality of a large number of desired synthesized views, we derive a cubic distortion model based on basic DIBR properties, whose parameters are obtained using only a small number of viewpoint samples. Then, we demonstrate that the optimal selection of coded views and quantization levels for corresponding texture and depth maps is equivalent to the shortest path in a specially constructed 3-D trellis. Finally, we show that, using the assumptions of monotonicity in the predictor's quantization level and distance, suboptimal solutions can be efficiently pruned from the feasible space during solution search. Experiments show that our proposed efficient selection of coded views and quantization levels for corresponding texture and depth maps outperforms an alternative scheme using constant quantization levels for all maps (commonly used in video standard implementations) by up to 1.5 dB. Moreover, the complexity of our scheme can be reduced by at least 80% over the full solution search.Index Terms-Bit allocation, depth-image-based rendering, 3-D image coding.

Section: B Motion/disparity Compensation Coding Tools and Dibr View mentioning

confidence: 99%

On Dependent Bit Allocation for Multiview Image Coding With Depth-Image-Based Rendering

IEEE Trans. on Image Process.

Ortega

2011

“…Novel tools for encoding texture maps [1], [2] and depth maps [4], [5] of multiview images have been recently proposed, but how bits should be optimally allocated among texture and depth maps for maximum fidelity is not addressed.…”

Section: Related Workmentioning

confidence: 99%

“…Using pixel and depth maps of neighboring views, intermediate views can be synthesized via depth-image-based rendering (DIBR) [3] at high fidelity. Efficient coding tools for depth maps, with unique characteristics such as smooth surfaces and sharp edges, have also been proposed recently [4], [5].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Bit allocation and encoded view selection for optimal multiview image representation

2010 IEEE International Workshop on Multimedia Signal Processing

2010

Abstract-Novel coding tools have been proposed recently to encode texture and depth maps of multiview images, exploiting inter-view correlations, for depth-image-based rendering (DIBR). However, the important associated bit allocation problem for DIBR remains open: for chosen view coding and synthesis tools, how to allocate bits among texture and depth maps across encoded views, so that the fidelity of a set of V views reconstructed at the decoder is maximized, for a fixed bitrate budget? In this paper, we present an optimization strategy to select subset of texture and depth maps of the original V views for encoding at appropriate quantization levels, so that at the decoder, the combined quality of decoded views (using encoded texture maps) and synthesized views (using encoded texture and depth maps of neighboring views) is maximized. We show that using the monotonicity property, complexity of our strategy can be greatly reduced. Experiments show that using our strategy, one can achieve up to 0.83dB gain in PSNR improvement over a heuristic scheme of encoding only texture maps of all V views at constant quantization levels. Further, computation can be reduced by up to 66% over a full parameter search approach.

“…Given the model, one can easily find the view location between two captured coded views where the maximum synthesized distortion occurs. Using a state-of-the-art multiview image codec based on the shape-adaptive wavelet transform (SA-WT) [5], we show how optimal bit allocation can be performed to minimize the maximum synthesized distortion at any intermediate viewpoint in a computationally efficient manner. For the case when there are only two captured viewpoints, we show experimentally that the optimal bit allocation can outperform a commonly deployed uniform bit allocation scheme by up to a 1.0dB in visual quality measured in Peak Signal-to-Noise Ratio (PSNR).…”

Section: Introductionmentioning

confidence: 99%

Bit allocation for multiview image compression using cubic synthesized view distortion model

2011 IEEE International Conference on Multimedia and Expo

Chakareski

2011

"Texture-plus-depth" has become a popular coding format for multiview image compression, where a decoder can synthesize images at intermediate viewpoints using encoded texture and depth maps of closest captured view locations via depth-image-based rendering (DIBR). As in other resource-constrained scenarios, limited available bits must be optimally distributed among captured texture and depth maps to minimize the expected signal distortion at the decoder. A specific challenge of multiview image compression for DIBR is that the encoder must allocate bits without the knowledge of how many and which specific virtual views will be synthesized at the decoder for viewing. In this paper, we derive a cubic synthesized view distortion model to describe the visual quality of an interpolated view as a function of the view's location. Given the model, one can easily find the virtual view location between two coded views where the maximum synthesized distortion occurs. Using a multiview image codec based on shape-adaptive wavelet transform, we show how optimal bit allocation can be performed to minimize the maximum view synthesis distortion at any intermediate viewpoint.Our experimental results show that the optimal bit allocation can outperform a common uniform bit allocation scheme by up to 1.0dB in coding efficiency performance, while simultaneously being competitive to a state-of-the-art H.264 codec.