2022
DOI: 10.1007/978-3-031-20080-9_13
|View full text |Cite
|
Sign up to set email alerts
|

RayTran: 3D Pose Estimation and Shape Reconstruction of Multiple Objects from Videos with Ray-Traced Transformers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(4 citation statements)
references
References 55 publications
0
4
0
Order By: Relevance
“…They are either trained on synthetic data [36], or on small real datasets [18,23,24]. Similarly, recent learning-based approaches reconstruct a scene from a video [30,25,26,49,42], and use Scan2CAD as their main evaluation benchmark. Our CAD-Estate dataset can benefit all of these works as it offers new, large-scale, diverse, real video data with annotated complex spatial arrangements of 3D objects into scenes.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…They are either trained on synthetic data [36], or on small real datasets [18,23,24]. Similarly, recent learning-based approaches reconstruct a scene from a video [30,25,26,49,42], and use Scan2CAD as their main evaluation benchmark. Our CAD-Estate dataset can benefit all of these works as it offers new, large-scale, diverse, real video data with annotated complex spatial arrangements of 3D objects into scenes.…”
Section: Related Workmentioning
confidence: 99%
“…The final goal is to detect all objects in the scene, recognize their class, reconstruct their 3D shape, as well as their pose within the overall scene coordinate frame. With the advances of scalable deep learning techniques, the field has progressed from reconstructing the 3D shape of one object in a simple image with trivial background [32,50,40,33,8,17], to limited reasoning about object arrangements in simple multi-object scenes [36,18,23], and finally to unrestricted multi-object 3D reconstruction in complex real-world scenes [49,30,42,12]. This evolution has been dependent on the availability of ever larger and more diverse data sets for training and evaluation [3,10,6,44,47,7,27,15,16] Existing datasets for Semantic 3D scene understanding fall broadly in two categories: synthetic and acquired from real images/videos.…”
Section: Introductionmentioning
confidence: 99%
“…[4,7,32,34,35,47,55,56] are very similar to these approach with slight variations, all basically doing direct pose regression from a CNN. There are many other examples which use combinations of these approaches and add in depth data [17,22,41,[48][49][50][51], however depth data is outside the scope of the desired solution here.…”
Section: Other Pose Detectorsmentioning
confidence: 99%
“…This lets us avoid complex non-linear optimization and makes the joint estimation amenable to potential on-board implementation. Most important, it lets us fully leverage the structured bias underlying category-level object pose estimation -Scan2CAD [4] -CAD models -RayTran [34] -Voxel Occupancy -Vid2CAD [24] CAD models Moving Avg.…”
Section: Introductionmentioning
confidence: 99%