Deep regression for monocular camera-based 6-DoF global localization in outdoor environments

Naseer, Tayyab; Burgard, Wolfram

doi:10.1109/iros.2017.8205957

Cited by 133 publications

(85 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Considering the relative improvement 6 , AnchorNet typically performs closer to other APR or RPR methods than to the best performing structurebased approach in each scene. It also fails to outperform the simple DenseVLAD baseline on the Street scene, which is the largest and most complex scene in the Cambridge Landmarks dataset [10,50].…”

Section: Experimental Comparisonmentioning

confidence: 99%

Understanding the Limitations of CNN-Based Absolute Camera Pose Regression

Sattler

Zhou

Pollefeys

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

376

270

View full text Add to dashboard Cite

Visual localization is the task of accurate camera pose estimation in a known scene. It is a key problem in computer vision and robotics, with applications including selfdriving cars, Structure-from-Motion, SLAM, and Mixed Reality. Traditionally, the localization problem has been tackled using 3D geometry. Recently, end-to-end approaches based on convolutional neural networks have become popular. These methods learn to directly regress the camera pose from an input image. However, they do not achieve the same level of pose accuracy as 3D structure-based methods. To understand this behavior, we develop a theoretical model for camera pose regression. We use our model to predict failure cases for pose regression techniques and verify our predictions through experiments. We furthermore use our model to show that pose regression is more closely related to pose approximation via image retrieval than to accurate pose estimation via 3D structure. A key result is that current approaches do not consistently outperform a handcrafted image retrieval baseline. This clearly shows that additional research is needed before pose regression algorithms are ready to compete with structure-based methods.

show abstract

Section: Experimental Comparisonmentioning

confidence: 99%

Understanding the Limitations of CNN-Based Absolute Camera Pose Regression

Sattler

Zhou

Pollefeys

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

376

270

View full text Add to dashboard Cite

show abstract

“…We calculate the pixel-wise mean for each of the scenes in the datasets and subtract them with the input images. We experimented with augmenting the images using pose synthesis [27] and synthetic view synthesis [18], however they did not yield any performance gains, rather in some cases they negatively affected the pose accuracy. We found that using random crops of 224 × 224 pixels acts as a better regularizer helping the network generalize better in comparison to synthetic augmentation techniques while saving preprocessing time.…”

Section: B Network Trainingmentioning

confidence: 99%

Deep Auxiliary Learning for Visual Localization and Odometry

Valada

Radwan

Burgard

2018

2018 IEEE International Conference on Robotics and Automation (ICRA)

Self Cite

239

148

View full text Add to dashboard Cite

Localization is an indispensable component of a robot's autonomy stack that enables it to determine where it is in the environment, essentially making it a precursor for any action execution or planning. Although convolutional neural networks have shown promising results for visual localization, they are still grossly outperformed by state-of-the-art local feature-based techniques. In this work, we propose VLocNet, a new convolutional neural network architecture for 6-DoF global pose regression and odometry estimation from consecutive monocular images. Our multitask model incorporates hard parameter sharing, thus being compact and enabling real-time inference, in addition to being end-to-end trainable. We propose a novel loss function that utilizes auxiliary learning to leverage relative pose information during training, thereby constraining the search space to obtain consistent pose estimates. We evaluate our proposed VLocNet on indoor as well as outdoor datasets and show that even our single task model exceeds the performance of state-of-the-art deep architectures for global localization, while achieving competitive performance for visual odometry estimation. Furthermore, we present extensive experimental evaluations utilizing our proposed Geometric Consistency Loss that show the effectiveness of multitask learning and demonstrate that our model is the first deep learning technique to be on par with, and in some cases outperforms state-of-theart SIFT-based approaches.

show abstract

“…Camera Relocalization and Sports Camera Calibration: Camera relocalization has been widely studied in the context of global localization for robots using edge images [6], random forests [7], [8] and deep networks [9], [10], [11], [12].…”

Section: Related Workmentioning

confidence: 99%

Sports Camera Calibration via Synthetic Data

Chen

Little

2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

View full text Add to dashboard Cite

Calibrating sports cameras is important for autonomous broadcasting and sports analysis. Here we propose a highly automatic method for calibrating sports cameras from a single image using synthetic data. First, we develop a novel camera pose engine. The camera pose engine has only three significant free parameters so that it can effectively generate a lot of camera poses and corresponding edge (i.e., field marking) images. Then, we learn compact deep features via a siamese network from paired edge image and camera pose and build a feature-pose database. After that, we use a novel two-GAN (generative adversarial network) model to detect field markings in real images. Finally, we query an initial camera pose from the feature-pose database and refine camera poses using truncated distance images. We evaluate our method on both synthetic and real data. Our method not only demonstrates the robustness on the synthetic data but also achieves the state-of-the-art accuracy on a standard soccer dataset and very high performance on a volleyball dataset.

show abstract

Deep regression for monocular camera-based 6-DoF global localization in outdoor environments

Cited by 133 publications

References 18 publications

Understanding the Limitations of CNN-Based Absolute Camera Pose Regression

Understanding the Limitations of CNN-Based Absolute Camera Pose Regression

Deep Auxiliary Learning for Visual Localization and Odometry

Sports Camera Calibration via Synthetic Data

Contact Info

Product

Resources

About