We propose a completely unsupervised approach to simultaneously estimate scene depth, ego-pose, ground segmentation and ground normal vector from only monocular RGB video sequences. In our approach, estimation for different scene structures can mutually benefit each other by the joint optimization. Specifically, we use the mutual information loss to pre-train the ground segmentation network and before adding the corresponding self-learning label obtained by a geometric method. By using the static nature of the ground and its normal vector, the scene depth and ego-motion can be efficiently learned by the self-supervised learning procedure. Extensive experimental results on both Cityscapes and KITTI benchmark demonstrate the significant improvement on the estimation accuracy for both scene depth and ego-pose by our approach. We also achieve an average error of about 3 ∘ for estimated ground normal vectors. By deploying our proposed geometric constraints, the IOU accuracy of unsupervised ground segmentation is increased by 35% on the Cityscapes dataset.
Accurate image feature point detection and matching are essential to computer vision tasks such as panoramic image stitching and 3D reconstruction. However, ordinary feature point approaches cannot be directly applied to fisheye images due to their large distortion, which makes the ordinary camera model unable to adapt. To address such a problem, this paper proposes a self-supervised learning method for feature point detection and matching on fisheye images. This method utilizes a Siamese network to automatically learn the correspondence of feature points across transformed image pairs to avoid high annotation costs. Due to the scarcity of the fisheye image dataset, a two-stage viewpoint transform pipeline is also adopted for image augmentation to increase the data variety. Furthermore, this method adopts both deformable convolution and contrastive learning loss to improve the feature extraction and description of distorted image regions. Compared with traditional feature point detectors and matchers, this method has been demonstrated with superior performance on fisheye images.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.