Majority of the existing deep learning based depth estimation approaches employed for finding depth from monocular image need very accurate ground truth depth information to train a supervised decision framework. However, it is not always possible to get an accurate depth information particularly for diverse outdoor scenes. To address this, a convolutional network architecture is proposed, which comprises of two encoder-decoders for utilizing stereo matching criterion for training. The image reconstruction error measure is employed for optimization of network parameters instead of ground truth depth information.To estimate an accurate disparity map in low textured and occluded regions, a cross based cost-aggregation loss term is proposed along with a novel occlusion detection and filling method in the post-processing stage. The proposed method achieves an improvement of 6.219% RMS error for a depth cap of 80m and 6.15% RMS error for a depth cap of 50 m among the unsupervised approaches on KITTI 2015 dataset. The importance of channel-wise descriptor for training a deep neural network is also established through the performance measure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.