Depth estimation using monocular camera sensors is an important technique in computer vision. Supervised monocular depth estimation requires a lot of data acquired from depth sensors. However, acquiring depth data is an expensive task. We sometimes cannot acquire data due to the limitations of the sensor. View synthesisbased depth estimation research is a self-supervised learning method that does not require depth data supervision. Previous studies mainly use CNN-based networks in encoders. CNN is suitable for extracting local features through convolution operation. Recent vision transformers are suitable for global feature extraction based on multi-self-attention modules. In this paper, we propose a hybrid network combining CNN and vision transformer network in self-supervised learning-based monocular depth estimation. We design an encoder-decoder structure that uses CNNs in the earlier stage of extracting local features and a vision transformer in the later stages of extracting global features. We evaluate the proposed network through various experiments based on KITTI and Cityscapes datasets. The results showed higher performance than previous studies and reduced parameters and computations. Codes and trained models are available at https://github.com/fogfog2/manydepthformer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.