Tanner Schmidt scite author profile

Estimating the 6D pose of known objects is important for robots to interact with the real world. The problem is challenging due to the variety of objects as well as the complexity of a scene caused by clutter and occlusions between objects. In this work, we introduce PoseCNN, a new Convolutional Neural Network for 6D object pose estimation. PoseCNN estimates the 3D translation of an object by localizing its center in the image and predicting its distance from the camera. The 3D rotation of the object is estimated by regressing to a quaternion representation. We also introduce a novel loss function that enables PoseCNN to handle symmetric objects. In addition, we contribute a large scale video dataset for 6D object pose estimation named the YCB-Video dataset. Our dataset provides accurate 6D poses of 21 objects from the YCB dataset observed in 92 videos with 133,827 frames. We conduct extensive experiments on our YCB-Video dataset and the OccludedLINEMOD dataset to show that PoseCNN is highly robust to occlusions, can handle symmetric objects, and provide accurate pose estimation using only color images as input. When using depth data to further refine the poses, our approach achieves state-of-the-art results on the challenging OccludedLINEMOD dataset. Our code and dataset are available at https

show abstract

PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes

Xiang

Schmidt

Narayanan

et al. 2017

Preprint

242

558

View full text Add to dashboard Cite

Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction

et al. 2020

View full text Add to dashboard Cite

DART: Dense Articulated Real-Time Tracking

2014

View full text Add to dashboard Cite

Abstract-This paper introduces DART, a general framework for tracking articulated objects composed of rigid bodies connected through a kinematic tree. DART covers a broad set of objects encountered in indoor environments, including furniture and tools, and human and robot bodies, hands and manipulators. To achieve efficient and robust tracking, DART extends the signed distance function representation to articulated objects and takes full advantage of highly parallel GPU algorithms for data association and pose optimization. We demonstrate the capabilities of DART on different types of objects that have each required dedicated tracking techniques in the past. I. INTRODUCTIONThe ability to accurately track the pose of objects in real time is of fundamental importance to many areas of robotics. Applications range from navigation to planning, manipulation and human-robot interaction, all of which have received the attention of researchers working within a state-space modelbased paradigm within both computer vision and robotics. The class of objects that can be described as collections of rigid bodies chained together through a kinematic tree is quite broad, including furniture, tools, human bodies, human hands, and robot manipulators. Tracking articulated bodies from a single viewpoint and without instrumenting the object of interest still presents a significant challenge where the single viewpoint and occlusions, including self-occlusion, limit the amount of information available for pose estimation. Noisy sensor data and approximate object models pose additional problems. Finally, the objects being tracked can be highly dynamic and have many degrees of freedom, making real-time tracking difficult.Early articulated model-based tracking techniques relied on tracking 2D features such as image edges on a CPU [8,4]. Recently introduced depth cameras along with highly parallel algorithms optimized for modern GPUs have enabled new algorithms for tracking complex 3D objects in real time. Examples include KinectFusion and related efforts for 3D mapping [23,16,34], human body pose tracking [29,35,15], articulated hand tracking [24,19,26]. These approaches were developed for specific application domains and have not been demonstrated or tested on multiple tracking applications. The application-specific nature of these approaches enables their authors to show excellent performance by taking advantage of domain-specific features and constraints, but it also prevents them from serving as general tools for tracking arbitrary articulated objects. Techniques have also been developed to

show abstract

Self-Supervised Visual Descriptor Learning for Dense Correspondence

Schmidt

Newcombe

Fox

2017

IEEE Robot. Autom. Lett.

152

118

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tanner Schmidt

PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes

PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes

Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction

DART: Dense Articulated Real-Time Tracking

Self-Supervised Visual Descriptor Learning for Dense Correspondence

Contact Info

Product

Resources

About