3D object instance recognition and pose estimation using triplet loss with dynamic margin

Zakharov, Sergey; Kehl, Wadim; Planche, Benjamin; Hutter, Andreas; Ilić, Slobodan

doi:10.1109/iros.2017.8202207

Cited by 48 publications

(37 citation statements)

References 17 publications

(33 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the LineMOD dataset [38] we consider two state of the art approaches [45], [43]. In [43] a convolutional network is used to map the image space to a descriptor space where the pose and object classes are predicted through a nearest neighbour classifier.…”

Section: Resultsmentioning

confidence: 99%

“…In [43] a convolutional network is used to map the image space to a descriptor space where the pose and object classes are predicted through a nearest neighbour classifier. The method in [45] builds upon [43], introducing a triplet loss function with a dynamic margin. These works employ a slightly different settings than ours since they use synthetic images.…”

Section: Resultsmentioning

confidence: 99%

“…Some methods tested on this dataset consider the use of both synthetic and real images [45], [46]. In this paper, in order to have a simpler experimental setup, we follow [47] and we consider a setting where only real images are used for training and testing the models.…”

Section: A the Rgb-d Triathlon Datasetmentioning

confidence: 99%

See 2 more Smart Citations

The RGB-D Triathlon: Towards Agile Visual Toolboxes for Robots

Cermelli

Mancini

Ricci

et al. 2019

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

View full text Add to dashboard Cite

Deep networks have brought significant advances in robot perception, enabling to improve the capabilities of robots in several visual tasks, ranging from object detection and recognition to pose estimation, semantic scene segmentation and many others. Still, most approaches typically address visual tasks in isolation, resulting in overspecialized models which achieve strong performances in specific applications but work poorly in other (often related) tasks. This is clearly sub-optimal for a robot which is often required to perform simultaneously multiple visual recognition tasks in order to properly act and interact with the environment. This problem is exacerbated by the limited computational and memory resources typically available onboard to a robotic platform. The problem of learning flexible models which can handle multiple tasks in a lightweight manner has recently gained attention in the computer vision community and benchmarks supporting this research have been proposed. In this work we study this problem in the robot vision context, proposing a new benchmark, the RGB-D Triathlon, and evaluating state of the art algorithms in this novel challenging scenario. We also define a new evaluation protocol, better suited to the robot vision setting. Results shed light on the strengths and weaknesses of existing approaches and on open issues, suggesting directions for future research.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

The RGB-D Triathlon: Towards Agile Visual Toolboxes for Robots

Cermelli

Mancini

Ricci

et al. 2019

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

View full text Add to dashboard Cite

show abstract

“…pulling viewpoints under similar poses close together and pushing dissimilar ones or different objects further away. As appeared in [13], m corresponds to a dynamic margin defined as:…”

Section: B Descriptor Learningmentioning

confidence: 99%

“…To analyze not only our method, but the effect of multitask learning, i.e. regression and learning robust feature descriptors together, we report the results compared to the baseline method [13]. Here we train on the loss function L d to compare to the results obtained by nearest neighbor pose retrieval, abbreviated as NN.…”

Section: Baseline Modelsmentioning

confidence: 99%

When Regression Meets Manifold Learning for Object Recognition and Pose Estimation

Bui

Zakharov

Albarqouni

et al. 2018

2018 IEEE International Conference on Robotics and Automation (ICRA)

Self Cite

View full text Add to dashboard Cite

In this work, we propose a method for object recognition and pose estimation from depth images using convolutional neural networks. Previous methods addressing this problem rely on manifold learning to learn low dimensional viewpoint descriptors and employ them in a nearest neighbor search on an estimated descriptor space. In comparison we create an efficient multi-task learning framework combining manifold descriptor learning and pose regression. By combining the strengths of manifold learning using triplet loss and pose regression, we could either estimate the pose directly reducing the complexity compared to NN search, or use learned descriptor for the NN descriptor matching. By in depth experimental evaluation of the novel loss function we observed that the view descriptors learned by the network are much more discriminative resulting in almost 30% increase regarding relative pose accuracy compared to related works. On the other hand, regarding directly regressed poses we obtained important improvement compared to simple pose regression. By leveraging the advantages of both manifold learning and regression tasks, we are able to improve the current state-of-the-art for object recognition and pose retrieval that we demonstrate through in depth experimental evaluation.

show abstract

Semi-supervised Pathology Segmentation with Disentangled Representations

Jiang

Chartsias

Zhang

et al. 2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Automated pathology segmentation remains a valuable diagnostic tool in clinical practice. However, collecting training data is challenging. Semi-supervised approaches by combining labelled and unlabelled data can offer a solution to data scarcity. An approach to semisupervised learning relies on reconstruction objectives (as self-supervision objectives) that learns in a joint fashion suitable representations for the task. Here, we propose Anatomy-Pathology Disentanglement Network (APD-Net), a pathology segmentation model that attempts to learn jointly for the first time: disentanglement of anatomy, modality, and pathology. The model is trained in a semi-supervised fashion with new reconstruction losses directly aiming to improve pathology segmentation with limited annotations. In addition, a joint optimization strategy is proposed to fully take advantage of the available annotations. We evaluate our methods with two private cardiac infarction segmentation datasets with LGE-MRI scans. APD-Net can perform pathology segmentation with few annotations, maintain performance with different amounts of supervision, and outperform related deep learning methods.

show abstract

3D object instance recognition and pose estimation using triplet loss with dynamic margin

Cited by 48 publications

References 17 publications

The RGB-D Triathlon: Towards Agile Visual Toolboxes for Robots

The RGB-D Triathlon: Towards Agile Visual Toolboxes for Robots

When Regression Meets Manifold Learning for Object Recognition and Pose Estimation

Semi-supervised Pathology Segmentation with Disentangled Representations

Contact Info

Product

Resources

About