2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00270
|View full text |Cite
|
Sign up to set email alerts
|

Learning Spatial Common Sense With Geometry-Aware Recurrent Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
57
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 56 publications
(57 citation statements)
references
References 17 publications
0
57
0
Order By: Relevance
“…Policies have to be learned to aid agents move around a scene and this is the task of active recognition [45,46,47,48,49]). The policy will be learned at the same time it is learning other tasks and representation and it will tell the agent where and how to strategically move to recognize things faster [50,51].…”
Section: Embodied Visual Recognitionmentioning
confidence: 99%
“…Policies have to be learned to aid agents move around a scene and this is the task of active recognition [45,46,47,48,49]). The policy will be learned at the same time it is learning other tasks and representation and it will tell the agent where and how to strategically move to recognize things faster [50,51].…”
Section: Embodied Visual Recognitionmentioning
confidence: 99%
“…They are designed for geometric completeness given their dense grid-like structure. Many previous works have demonstrated the strength of such representations in geometry modeling [8,9,15,61,62,67,78]. Some recent works [27,45,54] have also shown their effectiveness in modeling appearance.…”
Section: Related Workmentioning
confidence: 99%
“…However, it requires camera poses computed from SfM for training, and point cloud estimation can be inaccurate when the test image is out of distribution. Instead of using point clouds, neural 3D representations, including implicit function [43,59] and the deep voxels [13,19,58,47,46] have shown impressive reconstruction and synthesis results. Our work is closely related to the approach proposed by Tung et al [13], which leverages view prediction for learning latent 3D voxel structure of the scene.…”
Section: Related Workmentioning
confidence: 99%
“…Instead of using point clouds, neural 3D representations, including implicit function [43,59] and the deep voxels [13,19,58,47,46] have shown impressive reconstruction and synthesis results. Our work is closely related to the approach proposed by Tung et al [13], which leverages view prediction for learning latent 3D voxel structure of the scene. However, camera pose is still required to provide supervision.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation