Skeleton Merger: an Unsupervised Aligned Keypoint Detector

Shi, Ruoxi; Xue, Zhengrong; You, Yang; Lu, Cewu

doi:10.1109/cvpr46437.2021.00011

Cited by 31 publications

(28 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The authors introduce a distance metric on the target domain that incorporates intra-domain neighbor similarity and inter-domain label adaptation regions [ 24 ]. Based on autoencoder architecture, Shi et al proposed an unsupervised keypoint detector called Skeleton Merger [ 25 ]. The authors claimed Skeleton Merger could detect semantically rich and neatly aligned salient key points.…”

Section: Related Workmentioning

confidence: 99%

A Convolutional Neural-Network-Based Training Model to Estimate Actual Distance of Persons in Continuous Images

Tsai

Modales

Lin

2022

Sensors

View full text Add to dashboard Cite

Distance and depth detection plays a crucial role in intelligent robotics. It enables drones to understand their working environment to avoid collisions and accidents immediately and is very important in various AI applications. Image-based distance detection usually relies on the correctness of geometric information. However, the geometric features will be lost when the object is rotated or the camera lens image is distorted. This study proposes a training model based on a convolutional neural network, which uses a single-lens camera to estimate humans’ distance in continuous images. We can partially restore depth information loss using built-in camera parameters that do not require additional correction. The normalized skeleton feature unit vector has the same characteristics as time series data and can be classified very well using a 1D convolutional neural network. According to our results, the accuracy for the occluded leg image is over 90% at 2 to 3 m, 80% to 90% at 4 m, and 70% at 5 to 6 m.

show abstract

Section: Related Workmentioning

confidence: 99%

A Convolutional Neural-Network-Based Training Model to Estimate Actual Distance of Persons in Continuous Images

Tsai

Modales

Lin

2022

Sensors

View full text Add to dashboard Cite

show abstract

“…3D keypoints. The use of 3D keypoints for control is extensively studied in computer vision [33], [17], [41], [29], robotics [20], [19], [13], and reinforcement learning [36], [4]. However, we find that none of the existing methods shown in Table I meets all the requirements we listed that are beneficial to the task of generalizable robotic manipulation.…”

Section: A Object Representations For Manipulationmentioning

confidence: 99%

“…All the "labels" are pseudo ground-truth labels generated by the teacher network, free from any additional human annotations. The PointNet++ [24] module is with fixed parameters, extracted from a pre-trained Skeleton Merger [29]. The SPRIN [44] network is to be optimized in the training process.…”

Section: Student Networkmentioning

confidence: 99%

“…The teacher network. The PointNet++ [24] encoder in the teacher network is extracted from Skeleton Merger [29], a state-of-the-art category-level keypoint detector, to produce a weight matrix W ∈ R K×N . The multiplication of the weight matrix and the input point cloud directly gives the predicted keypoints…”

Section: B Useek: a Teacher-student Frameworkmentioning

confidence: 99%

“…To enable intra-category any-pose manipulation, an object representation that achieves category-level generalization is crucial. Existing representations can be roughly classified into three kinds: 6-DOF pose estimators [39], [38], [37], [40], [18], 3D keypoints [33], [29], [19], [20], [4], and dense correspondence models [28], [12], [32], [31]. Despite the disparities in form, their ultimate goals are consistent -to determine the local coordinate frame of the object.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations