2021
DOI: 10.48550/arxiv.2112.03243
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Input-level Inductive Biases for 3D Reconstruction

Abstract: Queries for image 1 Input image pair Multiple view geometry inductive biases flatten() Output depth for image 1 Input matrix Generalist Perception ModelFigure 1. Input-level inductive biases. We explore 3D reconstruction using a generalist perception model, the recent Perceiver IO [20] which ingests a matrix of unordered and flattened inputs (e.g. pixels). The model is interrogated using a query matrix and generates an output for every query -in this paper the outputs are depth values for all pixels of the i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(11 citation statements)
references
References 48 publications
0
11
0
Order By: Relevance
“…For the task of optical flow, Perceiver IO feeds positionally encoded images through a Transformer [18], rather than using a cost volume for processing. IIB [58] adapts the Perceiver IO architecture to generalized stereo estimation, proposing a novel epipolar parameterization as an additional input-level inductive bias. Building upon this baseline, we propose a series of geometry-preserving 3D data augmentation techniques designed to promote the learning of a geometrically-consistent latent scene representation.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…For the task of optical flow, Perceiver IO feeds positionally encoded images through a Transformer [18], rather than using a cost volume for processing. IIB [58] adapts the Perceiver IO architecture to generalized stereo estimation, proposing a novel epipolar parameterization as an additional input-level inductive bias. Building upon this baseline, we propose a series of geometry-preserving 3D data augmentation techniques designed to promote the learning of a geometrically-consistent latent scene representation.…”
Section: Related Workmentioning
confidence: 99%
“…Our proposed DeFiNe architecture (Figure 2a) is designed with flexibility in mind, so data from different sources can be used as input and different output tasks can be estimated from the same latent space. Similar to Yifan et al [58], we use Perceiver IO [18] as our general-purpose Transformer backbone. During the encoding stage, our model takes RGB images from calibrated cameras, with known intrinsics and relative poses.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations