Occlusions, Motion and Depth Boundaries with a Generic Network for Disparity, Optical Flow or Scene Flow Estimation

Ilg, Eddy; Saikia, Tonmoy; Keuper, Margret; Brox, Thomas

doi:10.1007/978-3-030-01258-8_38

Cited by 189 publications

(222 citation statements)

References 50 publications

(123 reference statements)

Supporting

Mentioning

220

Contrasting

Order By: Relevance

“…When applied to down-scaled images, these methods run faster, but gives blurry results and inaccurate disparity estimates for the far-field. Recent "deep" stereo methods perform well on low-resolution benchmarks [5,11,16,21,38], while failing to produce SOTA results on high-res benchmarks [26]. This is likely due to: 1) Their architectures are not efficiently designed to operate on high-resolution images.…”

Section: Introductionmentioning

confidence: 99%

Hierarchical Deep Stereo Matching on High-Resolution Images

Yang

Manela²,

Happold

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

246

182

View full text Add to dashboard Cite

Figure 1: Illustration of on-demand depth sensing with a coarse-to-fine hierarchy on the proposed dataset. Our method (HSM) captures the coarse layout of the scene in 91 milliseconds, finds the far-away car (shown in the red box) in 175 ms, and recovers the details of the car given extra 255 ms. AbstractWe explore the problem of real-time stereo matching on high-res imagery. Many state-of-the-art (SOTA) methods struggle to process high-res imagery because of memory constraints or speed limitations. To address this issue, we propose an end-to-end framework that searches for correspondences incrementally over a coarse-to-fine hierarchy. Because high-res stereo datasets are relatively rare, we introduce a dataset with high-res stereo pairs for both training and evaluation. Our approach achieved SOTA performance on Middlebury-v3 and KITTI-15 while running significantly faster than its competitors. The hierarchical design also naturally allows for anytime on-demand reports of disparity by capping intermediate coarse results, allowing us to accurately predict disparity for near-range structures with low latency (30ms). We demonstrate that the performance-vs-speed tradeoff afforded by on-demand hierarchies may address sensing needs for time-critical applications such as autonomous driving.

show abstract

Section: Introductionmentioning

confidence: 99%

Hierarchical Deep Stereo Matching on High-Resolution Images

Yang

Manela²,

Happold

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

246

182

View full text Add to dashboard Cite

show abstract

“…For object detection, we use both the recurrent rolling convolution (RRC) detector [29] and Track R-CNN [38]. We use optical flow obtained from [16]. Bounding Boxes to Segmentation Masks.…”

Section: Our Approachmentioning

confidence: 99%

Track to Reconstruct and Reconstruct to Track

Luiten

Fischer

Leibe

2020

IEEE Robot. Autom. Lett.

115

View full text Add to dashboard Cite

Object tracking and 3D reconstruction are often performed together, with tracking used as input for reconstruction. However, the obtained reconstructions also provide useful information for improving tracking. We propose a novel method that closes this loop, first tracking to reconstruct, and then reconstructing to track. Our approach, MOTSFusion (Multi-Object Tracking, Segmentation and dynamic object Fusion), exploits the 3D motion extracted from dynamic object reconstructions to track objects through long periods of complete occlusion and to recover missing detections. Our approach first builds up short tracklets using 2D optical flow, and then fuses these into dynamic 3D object reconstructions. The precise 3D object motion of these reconstructions is used to merge tracklets through occlusion into long-term tracks, and to locate objects when detections are missing. On KITTI, our reconstructionbased tracking reduces the number of ID switches of the initial tracklets by more than 50%, and outperforms all previous approaches for both bounding box and segmentation tracking.

show abstract

“…We train our proposed network with and without the self-supervision loss for the flow network. The officially provided pretrained model on FlyingChair dataset [10] is used if the self-supervision loss is disabled. Experimental results demonstrate that the flow network pretrained on the FlyingChair dataset [10] can generalize to our dataset, but with limited performance.…”

Section: Evaluation Metricsmentioning

confidence: 99%

“…The officially provided pretrained model on FlyingChair dataset [10] is used if the self-supervision loss is disabled. Experimental results demonstrate that the flow network pretrained on the FlyingChair dataset [10] can generalize to our dataset, but with limited performance. The resulting deblur network gives a PSNR metric as 31.23dB and a SSIM metric as 0.89 on our synthetic dataset, in contract to 32.24dB/0.91 if the network is trained in a fully selfsupervised manner.…”

Section: Evaluation Metricsmentioning

confidence: 99%

Self-Supervised Linear Motion Deblurring

Liu

Janai

Pollefeys

et al. 2020

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

Motion blurry images challenge many computer vision algorithms, e.g., feature detection, motion estimation, or object recognition. Deep convolutional neural networks are stateof-the-art for image deblurring. However, obtaining training data with corresponding sharp and blurry image pairs can be difficult. In this paper, we present a differentiable reblur model for selfsupervised motion deblurring, which enables the network to learn from real-world blurry image sequences without relying on sharp images for supervision. Our key insight is that motion cues obtained from consecutive images yield sufficient information to inform the deblurring task. We therefore formulate deblurring as an inverse rendering problem, taking into account the physical image formation process: we first predict two deblurred images from which we estimate the corresponding optical flow. Using these predictions, we re-render the blurred images and minimize the difference with respect to the original blurry inputs. We use both synthetic and real dataset for experimental evaluations. Our experiments demonstrate that self-supervised single image deblurring is really feasible and leads to visually compelling results. Both the code and datasets are available at https://github.com/ethliup/SelfDeblur.

show abstract

Occlusions, Motion and Depth Boundaries with a Generic Network for Disparity, Optical Flow or Scene Flow Estimation

Cited by 189 publications

References 50 publications

Hierarchical Deep Stereo Matching on High-Resolution Images

Hierarchical Deep Stereo Matching on High-Resolution Images

Track to Reconstruct and Reconstruct to Track

Self-Supervised Linear Motion Deblurring

Contact Info

Product

Resources

About