Learning End-to-End Scene Flow by Distilling Single Tasks Knowledge

Aleotti, Filippo; Poggi, Matteo; Tosi, Fabio; Mattoccia, Stefano

doi:10.1609/aaai.v34i07.6613

Cited by 31 publications

(26 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As shown in table 1, the refinement model improves over the feedforward baseline in all three metrics, thus supporting the effectiveness of the consistency guided refinement process. Our model also outperforms DWARF [1] by a large margin. Again, this shows the benefit of geometrically modelling the consistency of scene flow outputs.…”

Section: Evaluation On Synthetic Datamentioning

confidence: 72%

“…To leverage correlation between tasks, Jiang et al [15] propose an encoder architecture shared among the tasks of disparity, optical flow, and segmentation. Similarly, Aleotti et al [1] propose a lightweight architecture to share information between tasks. Complementary to previous work, our main objective is to improve generalization of the scene flow by means of additional constraints, self-supervised losses, and learnt refinement schemes.…”

Section: Related Workmentioning

confidence: 99%

“…We use the results produced by the feedforward module without a refinement network as baseline. To facilitate the comparison with the state-of-the-art, we test against a recent model DWARF [1] in the same experimental setting. As shown in table 1, the refinement model improves over the feedforward baseline in all three metrics, thus supporting the effectiveness of the consistency guided refinement process.…”

Section: Evaluation On Synthetic Datamentioning

confidence: 99%

“…Many end-to-end models powered by deep neural network have emerged for the scene flow estimation [1,15] and its components [5,13,29], and they showed promise in benchmarks. However, training such models relies on the availability of labeled data.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Consistency Guided Scene Flow Estimation

Chen

Gool

Schmid

et al. 2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Consistency Guided Scene Flow Estimation (CGSF ) is a selfsupervised framework for the joint reconstruction of 3D scene structure and motion from stereo video. The model takes two temporal stereo pairs as input, and predicts disparity and scene flow. The model self-adapts at test time by iteratively refining its predictions. The refinement process is guided by a consistency loss, which combines stereo and temporal photo-consistency with a geometric term that couples disparity and 3D motion. To handle inherent modeling error in the consistency loss (e.g. Lambertian assumptions) and for better generalization, we further introduce a learned, output refinement network, which takes the initial predictions, the loss, and the gradient as input, and efficiently predicts a correlated output update. In multiple experiments, including ablation studies, we show that the proposed model can reliably predict disparity and scene flow in challenging imagery, achieves better generalization than the state-of-the-art, and adapts quickly and robustly to unseen domains.

show abstract

Section: Evaluation On Synthetic Datamentioning

confidence: 72%

Section: Related Workmentioning

confidence: 99%

Section: Evaluation On Synthetic Datamentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Consistency Guided Scene Flow Estimation

Chen

Gool

Schmid

et al. 2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…The benefit of our split decoder is that competitive accuracy is achieved more stably and in fewer training iterations (at 56% of the full training schedule), with a lighter network (∼ 10% fewer parameters). 1 Figure 2. Decoder configuration: (a) A single joint decoder [23], (b) removing the context network, and (c) our split decoder design.…”

Section: Refined Backbone Architecturementioning

confidence: 99%

Self-Supervised Monocular Scene Flow Estimation

Hur

Roth

2020

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

113

View full text Add to dashboard Cite

Estimating 3D scene flow from a sequence of monocular images has been gaining increased attention due to the simple, economical capture setup. Owing to the severe illposedness of the problem, the accuracy of current methods has been limited, especially that of efficient, real-time approaches. In this paper, we introduce a multi-frame monocular scene flow network based on self-supervised learning, improving the accuracy over previous networks while retaining real-time efficiency. Based on an advanced twoframe baseline with a split-decoder design, we propose (i) a multi-frame model using a triple frame input and convolutional LSTM connections, (ii) an occlusion-aware census loss for better accuracy, and (iii) a gradient detaching strategy to improve training stability. On the KITTI dataset, we observe state-of-the-art accuracy among monocular scene flow methods based on self-supervised learning.

show abstract

Beyond Local Reasoning for Stereo Confidence Estimation with Deep Learning

Tosi

Poggi

Benincasa

et al. 2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Confidence measures for stereo gained popularity in recent years due to their improved capability to detect outliers and the increasing number of applications exploiting these cues. In this field, convolutional neural networks achieved top-performance compared to other known techniques in the literature by processing local information to tell disparity assignments from outliers. Despite this outstanding achievements, all approaches rely on clues extracted with small receptive fields thus ignoring most of the overall image content. Therefore, in this paper, we propose to exploit nearby and farther clues available from image and disparity domains to obtain a more accurate confidence estimation. While local information is very effective for detecting high frequency patterns, it lacks insights from farther regions in the scene. On the other hand, enlarging the receptive field allows to include clues from farther regions but produces smoother uncertainty estimation, not particularly accurate when dealing with high frequency patterns. For these reasons, we propose in this paper a multi-stage cascaded network to combine the best of the two worlds. Extensive experiments on three datasets using three popular stereo algorithms prove that the proposed framework outperforms state-of-the-art confidence estimation techniques.

show abstract

Learning End-to-End Scene Flow by Distilling Single Tasks Knowledge

Cited by 31 publications

References 49 publications

Consistency Guided Scene Flow Estimation

Consistency Guided Scene Flow Estimation

Self-Supervised Monocular Scene Flow Estimation

Beyond Local Reasoning for Stereo Confidence Estimation with Deep Learning

Contact Info

Product

Resources

About