Camera/image-based localization is important for many emerging applications such as augmented reality (AR), mixed reality, robotics, and self-driving. Camera localization is the problem of estimating both camera position and orientation with respect to an object. Use cases for camera localization depend on two key factors: accuracy and speed (latency). Therefore, this paper proposes Depth-DensePose, an efficient deep learning model for 6-degrees-of-freedom (6-DoF) camera-based localization. The Depth-DensePose utilizes the advantages of both DenseNets and adapted depthwise separable convolution (DS-Conv) to build a deeper and more efficient network. The proposed model consists of iterative depth-dense blocks. Each depth dense block contains two adapted DS-Conv with two kernel sizes 3 and 5, which are useful to retain both low-level as well as high-level features. We evaluate the proposed Depth-DensePose on the Cambridge Landmarks dataset, which shows that the Depth-DensePose outperforms the performance of related deep learning models for camera based localization. Furthermore, extensive experiments were conducted which proven the adapted DS-Conv is more efficient than the standard convolution. Especially, in terms of memory and processing time which is important to real-time and mobile applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.