2021
DOI: 10.1007/978-3-030-92659-5_34
|View full text |Cite
|
Sign up to set email alerts
|

T6D-Direct: Transformers for Multi-object 6D Pose Direct Regression

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(7 citation statements)
references
References 31 publications
0
7
0
Order By: Relevance
“…However, few attempts to use Transformers have been made in 6D object pose estimation. Recently, some studies showed that Transformers have a competitive performance in 6D object pose estimation as well [28][29][30][31]51,52]. PoET [28] is a Transformerbased framework that takes a single RGB image as input and estimates the 6D poses for all objects present in the image without object 3D CAD models.…”
Section: Transformer-based Methodsmentioning
confidence: 99%
“…However, few attempts to use Transformers have been made in 6D object pose estimation. Recently, some studies showed that Transformers have a competitive performance in 6D object pose estimation as well [28][29][30][31]51,52]. PoET [28] is a Transformerbased framework that takes a single RGB image as input and estimates the 6D poses for all objects present in the image without object 3D CAD models.…”
Section: Transformer-based Methodsmentioning
confidence: 99%
“…Pose estimation problem can be formulated as direct classification [14], regression [40], 2D-3D correspondences [41], or 3D-3D correspondences [42]. The following sections discuss various methods, along with some of the common loss functions associated with them.…”
Section: Problem Formulationmentioning
confidence: 99%
“…These classification methods usually outperform these methods in pose estimation. Amini et al [40] formulate pose estimation as a regression. In this paper the loss is formulated as a distance of the projected points in 2D as follow:…”
Section: Regressionmentioning
confidence: 99%
See 1 more Smart Citation
“…Benefiting from the powerful long-range modeling capability of multi-head self-attention (MSA) module, vision transformer (ViT) (Dosovitskiy et al 2020) and its variants (Touvron et al 2021;Liu et al 2021a) have achieved promising performance against the convolution neural networks in a variety of computer vision tasks (Amini, Periyasamy, and Behnke 2021;He et al 2021). Although ViTs provide an architecture with better feature representation, the high computational cost and massive parameters restrict their application in resource-limited devices (Chen et al 2021;Chuanyang et al 2022).…”
Section: Introductionmentioning
confidence: 99%