A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation

Mayer, Norbert Michael; Ilg, Eddy; Häusser, Philip; Fischer, Philipp; Cremers, Daniel; Dosovitskiy, Alexey; Brox, Thomas

doi:10.1109/cvpr.2016.438

Cited by 2,343 publications

(2,525 citation statements)

References 30 publications

Supporting

Mentioning

2,506

Contrasting

Unclassified

Order By: Relevance

“…Our full-frame Coordinate CNN is based on the architecture of DispNet [20]. It is fully convolutional [16] such that dense per-pixel scene coordinate predictions can be generated from arbitrary-sized input images.…”

Section: A Network Architecturementioning

confidence: 99%

See 1 more Smart Citation

Full-Frame Scene Coordinate Regression for Image-Based Localization

Li¹,

Ylioinas²,

Kannala³

2018

Robotics: Science and Systems XIV

View full text Add to dashboard Cite

Abstract-Image-based localization, or camera relocalization, is a fundamental problem in computer vision and robotics, and it refers to estimating camera pose from an image. Recent state-ofthe-art approaches use learning based methods, such as Random Forests (RFs) and Convolutional Neural Networks (CNNs), to regress for each pixel in the image its corresponding position in the scene's world coordinate frame, and solve the final pose via a RANSAC-based optimization scheme using the predicted correspondences. In this paper, instead of in a patch-based manner, we propose to perform the scene coordinate regression in a full-frame manner to make the computation efficient at test time and, more importantly, to add more global context to the regression process to improve the robustness. To do so, we adopt a fully convolutional encoder-decoder neural network architecture which accepts a whole image as input and produces scene coordinate predictions for all pixels in the image. However, using more global context is prone to overfitting. To alleviate this issue, we propose to use data augmentation to generate more data for training. In addition to the data augmentation in 2D image space, we also augment the data in 3D space. We evaluate our approach on the publicly available 7-Scenes dataset, and experiments show that it has better scene coordinate predictions and achieves state-of-the-art results in localization with improved robustness on the hardest frames (e.g., frames with repeated structures).

show abstract

Section: A Network Architecturementioning

confidence: 99%

“…Moreover, shortcut connections are added in between to overcome the data bottleneck. Unlike DispNet [20], there is only one final output layer at the end of the network and no multi-scale side predictions are used. Instead of ReLU [22], we use ELU [5] for the nonlinearity between layers.…”

Section: A Network Architecturementioning

confidence: 99%

Full-Frame Scene Coordinate Regression for Image-Based Localization

Li¹,

Ylioinas²,

Kannala³

2018

Robotics: Science and Systems XIV

View full text Add to dashboard Cite

show abstract

“…Some algorithms use deep convolutional networks to learn the matching cost [25,26], and perform cost aggregation and optimization using other techniques.…”

Section: Supervised Techniquesmentioning

confidence: 99%

Quick and energy-efficient Bayesian computing of binocular disparity using stochastic digital signals

Coninx¹,

Bessìère²,

Droulez³

2017

International Journal of Approximate Reasoning

View full text Add to dashboard Cite

Reconstruction of the tridimensional geometry of a visual scene using the binocular disparity information is an important issue in computer vision and mobile robotics, which can be formulated as a Bayesian inference problem. However, computation of the full disparity distribution with an advanced Bayesian model is usually an intractable problem, and proves computationally challenging even with a simple model. In this paper, we show how probabilistic hardware using distributed memory and alternate representation of data as stochastic bitstreams can solve that problem with high performance and energy efficiency. We put forward a way to express discrete probability distributions using stochastic data representations and perform Bayesian fusion using those representations, and show how that approach can be applied to diparity computation. We evaluate the system using a simulated stochastic implementation and discuss possible hardware implementations of such architectures and their potential for sensorimotor processing and robotics.

show abstract

“…In addition to the matching cost estimation network, another convolutional neural network is also undertaken for obtain- ing the disparity map in place of winner-takes-all (WTA) strategy [7]. A large synthetic dataset is rendered to train an end-to-end network with images rather than small patches as input [8]. The attained disparity maps does not achieve state-of-the-art performance, yet it is able to recover occlusions, where most patch-based networks fail.…”

Section: Introductionmentioning

confidence: 99%

Feature Ensemble Network with Occlusion Disambiguation for Accurate Patch-Based Stereo Matching

Wang

et al. 2017

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYAccurate stereo matching remains a challenging problem in case of weakly-textured areas, discontinuities and occlusions. In this letter, a novel stereo matching method, consisting of leveraging feature ensemble network to compute matching cost, error detection network to predict outliers and priority-based occlusion disambiguation for refinement, is presented. Experiments on the Middlebury benchmark demonstrate that the proposed method yields competitive results against the state-of-the-art algorithms. key words: stereo matching, convolutional neural network, patch-based, occlusion disambiguation

show abstract

A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation

Cited by 2,343 publications

References 30 publications

Full-Frame Scene Coordinate Regression for Image-Based Localization

Full-Frame Scene Coordinate Regression for Image-Based Localization

Quick and energy-efficient Bayesian computing of binocular disparity using stochastic digital signals

Feature Ensemble Network with Occlusion Disambiguation for Accurate Patch-Based Stereo Matching

Contact Info

Product

Resources

About