Depth estimation is a crucial step toward 3D scene understanding. Most traditional systems rely on direct sensing of this information by means of photogrammetry or on stereo imaging. As the scenes getting more complex, these modalities were impeded by, for instances, occlusion and imperfect lighting condition, etc. As a consequence, reconstructed surfaces are normally left with voids, due to missing data. Therefore, surface regularization is often required as post-processing. With the recent advances in deep learning, depth inference from a monocular image has attracted considerable interests. Many convolutional architectures have been proposed to infer depth information from a monocular image, with promising results. Thus far, visual cues learned and generalized by these networks may be ambiguous, resulting in inaccurate estimation. To address these issues, this paper presents an effective method for fusing point clouds extracted from depth values, directly measured by an infrared camera and estimated by a modified ResNet-50 from an RGB image, of the same scene. To ensure robustness and efficiency of finding the correspondence between and aligning these point clouds, an information theoretic alignment strategy, called CEICP, was proposed. The experimental results on a public dataset demonstrated that the proposed method outperformed its counterparts, while producing good quality surface renditions of the underlying scene.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.