2019
DOI: 10.48550/arxiv.1903.04939
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Fast Deep Stereo with 2D Convolutional Processing of Cost Signatures

Abstract: Modern neural network-based algorithms are able to produce highly accurate depth estimates from stereo image pairs, nearly matching the reliability of measurements from more expensive depth sensors. However, this accuracy comes with a higher computational cost since these methods use network architectures designed to compute and process matching scores across all candidate matches at all locations, with floating point computations repeated across a match volume with dimensions corresponding to both space and d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 17 publications
0
2
0
Order By: Relevance
“…On the one hand, methods such as MVSNet [50] and DPSNet [20] build 4D feature volumes [24], and regularize the feature volumes by employing 3D convolutions, a process that delivers high accuracy, but also is demanding in terms of memory and compute. On the other hand, MVDepthNet [48] and [51], directly generate 3D volumes by computing traditional cost measures on image features or RGB values. This allows the network architecture to be based on 2D convolutions, which are much faster than the 3D counterpart, and better suited for real-time applications.…”
Section: Related Workmentioning
confidence: 99%
“…On the one hand, methods such as MVSNet [50] and DPSNet [20] build 4D feature volumes [24], and regularize the feature volumes by employing 3D convolutions, a process that delivers high accuracy, but also is demanding in terms of memory and compute. On the other hand, MVDepthNet [48] and [51], directly generate 3D volumes by computing traditional cost measures on image features or RGB values. This allows the network architecture to be based on 2D convolutions, which are much faster than the 3D counterpart, and better suited for real-time applications.…”
Section: Related Workmentioning
confidence: 99%
“…However, PSM-Net still uses 3D convolutions and cannot process high-resolution images because of GPU memory constraints. Later meth-ods reduce the computational overhead and/or aim for highresolution images [11,27,22,24,20,28]. In particular, HSM-Net [27] estimates disparity in a coarse-to-fine manner, and uses novel data augmentation methods to achieve state-of-the-art in mean absolute error (MAE) in the highresolution Middlebury dataset while running with low latency.…”
Section: Related Workmentioning
confidence: 99%