To refer to or to cite this work, please use the citation to the published version: Verhack, R., Sikora, T., Lange, L., Van Wallendael, G., and Lambert, P. (2016 Ghent University -iMinds -Data Science Lab, Ghent, Belgium † Technische Universität Berlin -Communication Systems Lab, Berlin, Germany ABSTRACT Our challenge is the design of a "universal" bit-efficient image compression approach. The prime goal is to allow reconstruction of images with high quality. In addition, we attempt to design the coder and decoder "universal", such that MPEG-7-like low-and mid-level descriptors are an integral part of the coded representation. To this end, we introduce a sparse Mixture-of-Experts regression approach for coding images in the pixel domain. The underlying stochastic process of the pixel amplitudes are modelled as a 3-dimensional and multi-modal Mixture-of-Gaussians with K modes. This closed form continuous analytical model is estimated using the Expectation-Maximization algorithm and describes segments of pixels by local 3-D Gaussian steering kernels with global support. As such, each component in the mixture of experts steers along the direction of highest correlation. The conditional density then serves as the regression function. Experiments show that a considerable compression gain is achievable compared to JPEG for low bitrates for a large class of images, while forming attractive low-level descriptors for the image, such as the local segmentation boundaries, direction of intensity flow and the distribution of these parameters over the image.
The proposed framework, called Steered Mixture-ofExperts (SMoE), enables a multitude of processing tasks on light fields using a single unified Bayesian model. The underlying assumption is that light field rays are instantiations of a non-linear or non-stationary random process that can be modeled by piecewise stationary processes in the spatial domain. As such, it is modeled as a space-continuous Gaussian Mixture Model. Consequently, the model takes into account different regions of the scene, their edges, and their development along the spatial and disparity dimensions.Applications presented include light field coding, depth estimation, edge detection, segmentation, and view interpolation. The representation is compact, which allows for very efficient compression yielding state-of-the-art coding results for low bit-rates. Furthermore, due to the statistical representation, a vast amount of information can be queried from the model even without having to analyze the pixel values. This allows for "blind" light field processing and classification Index Terms-light field coding, depth estimation, light field representations, mixture-of-experts, mixture models
Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higherdimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for highfrequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.