“…As the first selfsupervised approach for VO, SfMLearner couples depth and pose estimations with image warping, which becomes the problem of minimizing photometric loss. Inherited from this idea, many self-supervised VO have been proposed, including modifications on loss functions [22,26], network architectures [3,4,22,28,40], predicted contents [39], and combination with classic VO/SLAM [5,38]. For example, GeoNet [39] extends the framework to jointly estimate optical flow with forward-backward consistency to infer unstable regions and achieves state-of-the-art performance among self-supervised VO methods.…”