Robot-Supervised Learning for Object Segmentation

Florence, Victoria; Corso, Jason J.; Griffin, Brent

doi:10.1109/icra40945.2020.9196543

Cited by 8 publications

(10 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…3), and we outperform recent self-supervised approaches with similar weak constraints on the setup by a large margin. Despite not using camera calibration, manipulator key point registration and additional depth data like [9] we still reach similar performance on a joint subset of objects (82.72% vs. 84.61% mIoU).…”

Section: A Evaluation Of Self-supervised Object Segmentationmentioning

confidence: 78%

“…We hypothesize that due to the thin shape of the object the network finds it difficult to establish correspondences between two consecutive frames and identify it as moving object. This would also explain the superior performance of [9] on this item, as their approach does not rely on identification by motion.…”

Section: A Evaluation Of Self-supervised Object Segmentationmentioning

confidence: 96%

“…Instead, [8] propose a self-supervised approach for pixelwise robot recognition: By projecting a robot model onto the image plane and simultaneously optimizing a GrabCutbased cost function, segmentation labels of the robot arm are obtained. Florence et al [9] build upon this framework and extend it to grasped object segmentation: They segment the foreground by projecting link positions of the robot into the camera frame which aids a graph-based depth segmentation followed by an additional refinement in RGB space. With this approach they collect segmentation masks of the manipulator to learn a robot arm representation.…”

Section: A Self-supervised Grasped Object Segmentationmentioning

confidence: 99%

See 2 more Smart Citations

“What’s This?” - Learning to Segment Unknown Objects from Manipulation Sequences

Boerdijk

Sundermeyer

Durner

et al. 2021

2021 IEEE International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

We present a novel framework for self-supervised grasped object segmentation with a robotic manipulator. Our method successively learns an agnostic foreground segmentation followed by a distinction between manipulator and object solely by observing the motion between consecutive RGB frames. In contrast to previous approaches, we propose a single, end-toend trainable architecture which jointly incorporates motion cues and semantic knowledge. Furthermore, while the motion of the manipulator and the object are substantial cues for our algorithm, we present means to robustly deal with distraction objects moving in the background, as well as with completely static scenes. Our method neither depends on any visual registration of a kinematic robot or 3D object models, nor on precise hand-eye calibration or any additional sensor data. By extensive experimental evaluation we demonstrate the superiority of our framework and provide detailed insights on its capability of dealing with the aforementioned extreme cases of motion. We also show that training a semantic segmentation network with the automatically labeled data achieves results on par with manually annotated training data. Code and pretrained model are available at https://github.com/DLR-RM/DistinctNet.

show abstract

Section: A Evaluation Of Self-supervised Object Segmentationmentioning

confidence: 78%

Section: A Evaluation Of Self-supervised Object Segmentationmentioning

confidence: 96%

Section: A Self-supervised Grasped Object Segmentationmentioning

confidence: 99%

See 1 more Smart Citation

“What’s This?” - Learning to Segment Unknown Objects from Manipulation Sequences

Boerdijk

Sundermeyer

Durner

et al. 2021

2021 IEEE International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

show abstract

“…where W are the trainable network parameters, d 1 is the ground truth object depth at z 1 (6), and f d ∈ R is the predicted depth. To use the normalized distance input z (10), we modify (11) and define a normalized depth loss as…”

Section: Normalized Relative Depth Lossmentioning

confidence: 99%

“…Alternatively, RGB cameras are less expensive and more ubiquitous than 3D sensors, and there are many more datasets and methods based on RGB images [8,17,27]. Thus, even when 3D sensors are available, RGB images remain a critical modality for understanding data and identifying objects [11,52].…”

Section: Introductionmentioning

confidence: 99%

Learning Object Depth from Camera Motion and Video Object Segmentation

Griffin¹,

Corso²

2020

Preprint

Self Cite

View full text Add to dashboard Cite

Video object segmentation, i.e., the separation of a target object from background in video, has made significant progress on real and challenging videos in recent years. To leverage this progress in 3D applications, this paper addresses the problem of learning to estimate the depth of segmented objects given some measurement of camera motion (e.g., from robot kinematics or vehicle odometry). We achieve this by, first, introducing a diverse, extensible dataset and, second, designing a novel deep network that estimates the depth of objects using only segmentation masks and uncalibrated camera movement. Our data-generation framework creates artificial object segmentations that are scaled for changes in distance between the camera and object, and our network learns to estimate object depth even with segmentation errors. We demonstrate our approach across domains using a robot camera to locate objects from the YCB dataset and a vehicle camera to locate obstacles while driving.

show abstract

Learning Object Depth from Camera Motion and Video Object Segmentation

Griffin

Corso

2020

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

This paper addresses the problem of learning to estimate the depth of detected objects given some measurement of camera motion (e.g., from robot kinematics or vehicle odometry). We achieve this by 1) designing a recurrent neural network (DBox) that estimates the depth of objects using a generalized representation of bounding boxes and uncalibrated camera movement and 2) introducing the Object Depth via Motion and Detection Dataset (ODMD). ODMD training data are extensible and configurable, and the ODMD benchmark includes 21,600 examples across four validation and test sets. These sets include mobile robot experiments using an end-effector camera to locate objects from the YCB dataset and examples with perturbations added to camera motion or bounding box data. In addition to the ODMD benchmark, we evaluate DBox in other monocular application domains, achieving state-of-the-art results on existing driving and robotics benchmarks and estimating the depth of objects using a camera phone.

show abstract

Robot-Supervised Learning for Object Segmentation

Cited by 8 publications

References 25 publications

“What’s This?” - Learning to Segment Unknown Objects from Manipulation Sequences

“What’s This?” - Learning to Segment Unknown Objects from Manipulation Sequences

Learning Object Depth from Camera Motion and Video Object Segmentation

Learning Object Depth from Camera Motion and Video Object Segmentation

Contact Info

Product

Resources

About