2020
DOI: 10.1109/lra.2020.2975750
|View full text |Cite
|
Sign up to set email alerts
|

Flow-Motion and Depth Network for Monocular Stereo and Beyond

Abstract: We propose a learning-based method 1 that solves monocular stereo and can be extended to fuse depth information from multiple target frames. Given two unconstrained images from a monocular camera with known intrinsic calibration, our network estimates relative camera poses and the depth map of the source image. The core contribution of the proposed method is threefold. First, a network is tailored for static scenes that jointly estimates the optical flow and camera motion. By the joint estimation, the optical … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(11 citation statements)
references
References 25 publications
(22 reference statements)
0
11
0
Order By: Relevance
“…Recently, realistic simulators were created for driving scenes, such as CARLA [224], Nvidia Drive Sim 2 , and indoor scenes, such as Habitat [225; 226]. Despite the usage of simulators, other datasets rely on game engines or general computer graphics engines to build their systems, such as SYNTHIA [99], Virtual KITTI [87], and Virtual KITTI 2 [88] that used Unity 3 as graphic engine, and GTA-SfM [142] that uses scenes from the game GTAV.…”
Section: Discussionmentioning
confidence: 99%
“…Recently, realistic simulators were created for driving scenes, such as CARLA [224], Nvidia Drive Sim 2 , and indoor scenes, such as Habitat [225; 226]. Despite the usage of simulators, other datasets rely on game engines or general computer graphics engines to build their systems, such as SYNTHIA [99], Virtual KITTI [87], and Virtual KITTI 2 [88] that used Unity 3 as graphic engine, and GTA-SfM [142] that uses scenes from the game GTAV.…”
Section: Discussionmentioning
confidence: 99%
“…GTA-SFM [34]: GTA-SFM is a synthetic dataset rendered from GTA-V, an open-world game with large-scale city models. It contains 200 scenes for training and 19 scenes for testing.…”
Section: Datasetsmentioning
confidence: 99%
“…In default, we utilize CasMVSNet [3] as the backbone network. The split of train, valid and test sets in each dataset follows the official configuration in DTU [4], BlendedMVS [33] and GTA-SFM [34]. Since the semi-supervised MVS problem in this paper aims to remedy the urge for large-scale MVS data, we only use limited annotated ground truth during training.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…The first predicts the optical flow, whereas the second takes this prediction into consideration while inferring depth maps and surface normals. A comparable approach is presented by Wang et al (2020) [25] where the network first jointly estimates optical flow and camera motion. A triangulation layer is then proposed to encode this information and, finally, a depth map is estimated.…”
Section: B Estimating Depth From Motionmentioning
confidence: 99%