APAC-Net: Unsupervised Learning of Depth and Ego-Motion from Monocular Video

Lin, Rui; Lu, Yao; Lu, Guangming

doi:10.1007/978-3-030-36189-1_28

Cited by 6 publications

(10 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Featdepth [17] introduced the FeatureNet network architecture for single-view reconstruction based on the cross-view reconstruction networks DepthNet and PosNet. Feature losses generated by FeatureNet are used to constrain the overall network depth map reconstruction, but the additional feature reconstruction network increases the computational burden of the system.Geometric priors are introduced in [7,14,33], which consider the 3D consistency between point clouds back-projected from adjacent views.…”

Section: Self-supervised Depth Estimationmentioning

confidence: 99%

CFDepthNet: Monocular Depth Estimation Introducing Coordinate Attention and Texture Features

Feng

Zhu

Wang

et al. 2023

Preprint

View full text Add to dashboard Cite

Handling the depth estimation of low-texture regions using photometric error loss is a challenge due to the difficulty of achieving convergence due to the presence of multiple local minima for pixels in low-texture regions (or even no-texture regions). In this paper, based on the photometric loss, we also introduce texture feature metric loss as a constraint and combine the coordinate attention mechanism to improve the depth map's texture quality and edge detail. This paper uses a simple yet compact network structure, a unique loss function, and a relatively flexible embedded attention module, which is more effective and easier to arrange in robotic platforms with weak arithmetic power. The tests show that our network structure not only shows high quality and state-of-the-art results on the KITTI dataset, but the same training results also perform well on the cityscapes and Make3D datasets.

show abstract

Section: Self-supervised Depth Estimationmentioning

confidence: 99%

CFDepthNet: Monocular Depth Estimation Introducing Coordinate Attention and Texture Features

Feng

Zhu

Wang

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…In the past many years, the method of depth detection with lidar has been studied extensively [ 12 , 13 , 14 , 15 , 16 ] while estimating depth information from a single image taken by a monocular camera is attracting more research interest [ 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 ]. Monocular depth estimation is essentially vague and a technically ill-posed problem: With only an image, there will be an infinite number of possible world scenes where the image comes from.…”

Section: Introductionmentioning

confidence: 99%

“…The difficulty of monocular depth estimation has attracted considerable attention for over a decade, and researchers have developed many methods to complete the task. Generally, their methods can be categorized into two kinds: methods based on hand-crafted features and probabilistic graphical models [ 18 , 19 , 20 , 21 , 22 ], and methods using convolutional neural networks (CNN) [ 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 ].…”

Section: Introductionmentioning

confidence: 99%

“…The self-supervised method (also known as the unsupervised method) is also a popular strategy for depth estimation [ 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 ] which usually uses monocular image sequences in video streams as training sets and geometric constraints of the sequences are based on projections between adjacent frames. This method does not mean that a network is trained without any supervised information, it means the algorithm can automatically obtain the supervised relation between the unlabeled data and the data generated during the training period.…”

Section: Introductionmentioning

confidence: 99%

“…Mahjourian et al [ 36 ] estimated the depth with CNN and proposed a novel self-supervised strategy that considers the consistency of the estimated 3D point clouds and the ego-motion across successive frames. Besides Mahjourian et al, Liu et al [ 37 ] also proposed a self-supervised novel method, the Attention-Pixel and Attention-Channel Network (APAC-Net), for self-supervised monocular learning of estimating scene depth and ego-motion. The Temporal-consistency loss LTemp between adjacent frames and the Scale-based loss LScale among different scales were utilized to train the network.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Monocular Depth Estimation with Self-Supervised Learning for Vineyard Unmanned Agricultural Vehicle

Cui

Feng

Wang

et al. 2022

Sensors

View full text Add to dashboard Cite

To find an economical solution to infer the depth of the surrounding environment of unmanned agricultural vehicles (UAV), a lightweight depth estimation model called MonoDA based on a convolutional neural network is proposed. A series of sequential frames from monocular videos are used to train the model. The model is composed of two subnetworks—the depth estimation subnetwork and the pose estimation subnetwork. The former is a modified version of U-Net that reduces the number of bridges, while the latter takes EfficientNet-B0 as its backbone network to extract the features of sequential frames and predict the pose transformation relations between the frames. The self-supervised strategy is adopted during the training, which means the depth information labels of frames are not needed. Instead, the adjacent frames in the image sequence and the reprojection relation of the pose are used to train the model. Subnetworks’ outputs (depth map and pose relation) are used to reconstruct the input frame, then a self-supervised loss between the reconstructed input and the original input is calculated. Finally, the loss is employed to update the parameters of the two subnetworks through the backward pass. Several experiments are conducted to evaluate the model’s performance, and the results show that MonoDA has competitive accuracy over the KITTI raw dataset as well as our vineyard dataset. Besides, our method also possessed the advantage of non-sensitivity to color. On the computing platform of our UAV’s environment perceptual system NVIDIA JETSON TX2, the model could run at 18.92 FPS. To sum up, our approach provides an economical solution for depth estimation by using monocular cameras, which achieves a good trade-off between accuracy and speed and can be used as a novel auxiliary depth detection paradigm for UAVs.

show abstract

Deep Learning for Visual Localization and Mapping: A Survey

Chen,

Wang,

et al. 2024

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

Deep-learning-based localization and mapping approaches have recently emerged as a new research direction and receive significant attention from both industry and academia. Instead of creating hand-designed algorithms based on physical models or geometric theories, deep learning solutions provide an alternative to solve the problem in a data-driven way. Benefiting from the ever-increasing volumes of data and computational power on devices, these learning methods are fast evolving into a new area that shows potential to track self-motion and estimate environmental models accurately and robustly for mobile agents. In this work, we provide a comprehensive survey and propose a taxonomy for the localization and mapping methods using deep learning. This survey aims to discuss two basic questions: whether deep learning is promising for localization and mapping, and how deep learning should be applied to solve this problem. To this end, a series of localization and mapping topics are investigated, from the learning-based visual odometry and global relocalization to mapping, and simultaneous localization and mapping (SLAM). It is our hope that this survey organically weaves together the recent works in this vein from robotics, computer vision, and machine learning communities and serves as a guideline for future researchers to apply deep learning to tackle the problem of visual localization and mapping.

show abstract

APAC-Net: Unsupervised Learning of Depth and Ego-Motion from Monocular Video

Cited by 6 publications

References 20 publications

CFDepthNet: Monocular Depth Estimation Introducing Coordinate Attention and Texture Features

CFDepthNet: Monocular Depth Estimation Introducing Coordinate Attention and Texture Features

Monocular Depth Estimation with Self-Supervised Learning for Vineyard Unmanned Agricultural Vehicle

Deep Learning for Visual Localization and Mapping: A Survey

Contact Info

Product

Resources

About