2021
DOI: 10.1007/s12559-021-09966-y
|View full text |Cite
|
Sign up to set email alerts
|

CSA6D: Channel-Spatial Attention Networks for 6D Object Pose Estimation

Abstract: Abstract6D object pose estimation plays a crucial role in robotic manipulation and grasping tasks. The aim to estimate the 6D object pose from RGB or RGB-D images is to detect objects and estimate their orientations and translations relative to the given canonical models. RGB-D cameras provide two sensory modalities: RGB and depth images, which could benefit the estimation accuracy. But the exploitation of two different modality sources remains a challenging issue. In this paper, inspired by recent works on at… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 41 publications
0
4
0
Order By: Relevance
“…He et al [12] incorporate a channel-level attention module for the adaptive feature fusion into U-Net and calculate distances between pixels and keypoints using prior distance augmented loss. Another related architecture based on the channel spatial attention network (CSA6D) is proposed by Chen and Gu [40] to estimate the 6D object pose from RGB-D images.…”
Section: Voting-based Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…He et al [12] incorporate a channel-level attention module for the adaptive feature fusion into U-Net and calculate distances between pixels and keypoints using prior distance augmented loss. Another related architecture based on the channel spatial attention network (CSA6D) is proposed by Chen and Gu [40] to estimate the 6D object pose from RGB-D images.…”
Section: Voting-based Methodsmentioning
confidence: 99%
“…Here, we compare our results with 6D pose estimation approaches using a single RGB image, which are state of the art in this research area. e comparisons have been carried out against PVNet [8], DPVL [11], ASPP-DF-PVNet with L+ loss [7], and PDAL-AFAM approach of He et al (2021) [12] and some previous approaches such as PoseCNN [5], SSD6D [1], YOLO6D [3], BB8 [29], CDPN [32], DPOD [31], Pix2Pose [33], and CSA6D [40]. e results are evaluated using ADD (-S) and 2D-Projection metrics on LINEMOD and occlusion LINEMOD datasets.…”
Section: Comparisons With State Of the Artmentioning
confidence: 99%
See 1 more Smart Citation
“…There has been great progress in reconstructing or estimating the pose of a single hand [KS12,GRL*19,IMB*18,CCY*21,ZLM*19] or objects [HHFS19, KMT*17, PLH*19, ZSI19, LF20, ZHMW22, LZXQ21, YJLF22, CG22, ZBB21, SHCM21] alone over recent decades. Lacking good datasets labeling hands and objects together, early work on hand‐object interaction focused on recovering either the hand [RKK09, RKI*14] or object [TG15] pose in a interaction.…”
Section: Related Workmentioning
confidence: 99%
“…With recent advances in 3D scanning technologies, it becomes convenient to obtain 3D raw data. As the fundamental 3D representation, point cloud has attracted extensive attention for various 3D applications [1,2]. Recently, researchers focus on exploiting Convolution Neural Networks (CNNs) to process 3D point cloud, which can be generally categorized into three types: projectionbased methods [3,4,5], voxelization-based methods [6,7], and point-based methods [8,9,10,11].…”
Section: Introductionmentioning
confidence: 99%