2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2020
DOI: 10.1109/cvprw50498.2020.00510
|View full text |Cite
|
Sign up to set email alerts
|

Self-supervised Object Motion and Depth Estimation from Video

Abstract: We present a self-supervised learning framework to estimate the individual object motion and monocular depth from video. We model the object motion as a 6 degree-offreedom rigid-body transformation. The instance segmentation mask is leveraged to introduce the information of object. Compared with methods which predict dense optical flow map to model the motion, our approach significantly reduces the number of values to be estimated. Our system eliminates the scale ambiguity of motion prediction through imposing… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
4
2

Relationship

1
9

Authors

Journals

citations
Cited by 36 publications
(14 citation statements)
references
References 29 publications
0
14
0
Order By: Relevance
“…The core idea is to apply differentiable warp and minimize photometric reprojection error. Recent methods improve the performance through incorporating coupled training with optical flow [Ranjan et al 2019;Yin and Shi 2018;Zou et al 2018], object motion [Dai et al 2019;Vijayanarasimhan et al 2017], surface normal [Qi et al 2018], edge , and visual odometry [Andraghetti et al 2019;Shi et al 2019;Wang et al 2018b]. Other notable efforts include using stereo information [Guo et al 2018;Watson et al 2019], better network architecture and training loss design [Gordon et al 2019;Guizilini et al 2019], scale-consistent ego-motion network [Bian et al 2019], incorporating 3D geometric constraints [Mahjourian et al 2018], and learning from unknown camera intrinsics [Chen et al 2019b;Gordon et al 2019].…”
Section: Related Workmentioning
confidence: 99%
“…The core idea is to apply differentiable warp and minimize photometric reprojection error. Recent methods improve the performance through incorporating coupled training with optical flow [Ranjan et al 2019;Yin and Shi 2018;Zou et al 2018], object motion [Dai et al 2019;Vijayanarasimhan et al 2017], surface normal [Qi et al 2018], edge , and visual odometry [Andraghetti et al 2019;Shi et al 2019;Wang et al 2018b]. Other notable efforts include using stereo information [Guo et al 2018;Watson et al 2019], better network architecture and training loss design [Gordon et al 2019;Guizilini et al 2019], scale-consistent ego-motion network [Bian et al 2019], incorporating 3D geometric constraints [Mahjourian et al 2018], and learning from unknown camera intrinsics [Chen et al 2019b;Gordon et al 2019].…”
Section: Related Workmentioning
confidence: 99%
“…Forecasting of non-semantic targets: The most common forecasting techniques operate on trajectories. They track and anticipate the future position of individual objects, either in 2D or 3D [15,46,16,71]. For instance, Hsieh et al [26] disentangle position and pose of multiple moving objects -but only on synthetic data.…”
Section: Methods That Anticipatementioning
confidence: 99%
“…However, DFNet does not distinguish between dynamic and static regions when calculating optical flow consistency constraints. For this reason, Casser et al [29] and Dai et al [32] both use a pretrained semantic segmentation network to obtain the mask of the dynamic region. Although cc [33] also uses neural networks to estimate dynamic and static regions, CC does not use a pre-trained network, but adds region segmentation to the self-supervised framework in a competitive and cooperative manner.…”
Section: Unsupervised Joint Learning Of Depth Optical Flow and Ego-mo...mentioning
confidence: 99%