Convolutional Neural Networks (CNN) have performed extremely well for many image analysis tasks. However, supervised training of deep CNN architectures requires huge amounts of labelled data which is unavailable for light field images. In this paper, we leverage on synthetic light field images and propose a two stream CNN network that learns to estimate the disparities of multiple correlated neighbourhood pixels from their Epipolar Plane Images (EPI). Since the EPIs are unrelated except at their intersection, a two stream network is proposed to learn convolution weights individually for the EPIs and then combine the outputs of the two streams for disparity estimation. The CNN estimated disparity map is then refined using the central RGB light field image as a prior in a variational technique. We also propose a new real world dataset comprising light field images of 19 objects captured with the Lytro Illum camera in outdoor scenes and their corresponding 3D pointclouds, as ground truth, captured with the 3dMD scanner. This dataset will be made public to allow more precise 3D pointcloud level comparison of algorithms in the future which is currently not possible. Experiments on the synthetic and real world datasets show that our algorithm outperforms existing state-of-the-art for depth estimation from light field images.
3D object grounding aims to locate the most relevant target object in a raw point cloud scene based on a freeform language description. Understanding complex and diverse descriptions, and lifting them directly to a point cloud is a new and challenging topic due to the irregular and sparse nature of point clouds. There are three main challenges in 3D object grounding: to find the main focus in the complex and diverse description; to understand the point cloud scene; and to locate the target object. In this paper, we address all three challenges. Firstly, we propose a language scene graph module to capture the rich structure and long-distance phrase correlations. Secondly, we introduce a multi-level 3D proposal relation graph module to extract the object-object and object-scene co-occurrence relationships, and strengthen the visual features of the initial proposals. Lastly, we develop a description guided 3D visual graph module to encode global contexts of phrases and proposals by a nodes matching strategy. Extensive experiments on challenging benchmark datasets (ScanRefer [3] and Nr3D [42]) show that our algorithm outperforms existing state-of-the-art. Our code is available at https://github.com/PNXD/FFL-3DOG.
Reconstructing 3D facial geometry from a single RGB image has recently instigated wide research interest. However, it is still an ill-posed problem and most methods rely on prior models hence undermining the accuracy of the recovered 3D faces. In this paper, we exploit the Epipolar Plane Images (EPI) obtained from light field cameras and learn CNN models that recover horizontal and vertical 3D facial curves from the respective horizontal and vertical EPIs. Our 3D face reconstruction network (FaceLFnet) comprises a densely connected architecture to learn accurate 3D facial curves from low resolution EPIs. To train the proposed FaceLFnets from scratch, we synthesize photorealistic light field images from 3D facial scans. The curve by curve 3D face estimation approach allows the networks to learn from only 14K images of 80 identities, which still comprises over 11 Million EPIs/curves. The estimated facial curves are merged into a single pointcloud to which a surface is fitted to get the final 3D face. Our method is model-free, requires only a few training samples to learn FaceLFnet and can reconstruct 3D faces with high accuracy from single light field images under varying poses, expressions and lighting conditions. Comparison on the BU-3DFE and BU-4DFE datasets show that our method reduces reconstruction errors by over 20% compared to recent state of the art.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.