Active 3D Segmentation through Fixation of Previously Unseen Objects

2013 IEEE/RSJ International Conference on Intelligent Robots and Systems

Bekiroglu

Högman

et al. 2013

Self Cite

114

132

Abstract-Object shape information is an important parameter in robot grasping tasks. However, it may be difficult to obtain accurate models of novel objects due to incomplete and noisy sensory measurements. In addition, object shape may change due to frequent interaction with the object (cereal boxes, etc). In this paper, we present a probabilistic approach for learning object models based on visual and tactile perception through physical interaction with an object. Our robot explores unknown objects by touching them strategically at parts that are uncertain in terms of shape. The robot starts by using only visual features to form an initial hypothesis about the object shape, then gradually adds tactile measurements to refine the object model. Our experiments involve ten objects of varying shapes and sizes in a real setup. The results show that our method is capable of choosing a small number of touches to construct object models similar to real object shapes and to determine similarities among acquired models.

Section: A Visual Measurementsmentioning

confidence: 99%

Enhancing visual perception of shape through tactile glances

2013 IEEE/RSJ International Conference on Intelligent Robots and Systems

Bekiroglu

Högman

et al. 2013

Self Cite

114

132

“…There are several methods exploiting this approach [1,6,17], but [1,17] are computationally expensive. Since we aim at real-time performance, we build upon our original work in [6,7], which, contrary to the other two approaches, has the additional advantage of being easily extendable to handle multiple objects simultaneously, as demonstrated in [3]. Similarly to our approach, [2,18] take use an iterative approach, but require a human expert for guidance.…”

Section: Related Workmentioning

confidence: 99%

“…In order to generate a hypothesis, this block requires that at least one pixel in the image is labeled as belonging to an object. We use the method described in [7] to identify this point. The output is a dense labeling L A t of every pixel in the image and a model of the appearance of each detected object.…”

Section: System Overviewmentioning

confidence: 99%

Scene Understanding through Autonomous Interactive Perception

Bergström

Lecture Notes in Computer Science

et al. 2011

Self Cite

Abstract. We propose a framework for detecting, extracting and modeling objects in natural scenes from multi-modal data. Our framework is iterative, exploiting different hypotheses in a complementary manner. We employ the framework in realistic scenarios, based on visual appearance and depth information. Using a robotic manipulator that interacts with the scene, object hypotheses generated using appearance information are confirmed through pushing. The framework is iterative, each generated hypothesis is feeding into the subsequent one, continuously refining the predictions about the scene. We show results that demonstrate the synergic effect of applying multiple hypotheses for real-world scene understanding. The method is efficient and performs in real-time.

“…The intention has been to keep the required information limited, making the applicability of the system as wide as possible. In an earlier version of the system [62], single foreground parts were always expected to be found in the center of view. This was possible by letting an attention mechanism control the camera system placing the detected regions of interest in the center after a view change.…”

Section: Initializationmentioning

confidence: 99%

Detecting, segmenting and tracking unknown objects using multi-label MRF inference

Computer Vision and Image Understanding

Bergström

Kragić

2014

Self Cite

This article presents a unified framework for detecting, segmenting and tracking unknown objects in everyday scenes, allowing for inspection of object hypotheses during interaction over time. A heterogeneous scene representation is proposed, with background regions modeled as a combinations of planar surfaces and uniform clutter, and foreground objects as 3D ellipsoids. Recent energy minimization methods based on loopy belief propagation, tree-reweighted message passing and graph cuts are studied for the purpose of multi-object segmentation and benchmarked in terms of segmentation quality, as well as computational speed and how easily methods can be adapted for parallel processing. One conclusion is that the choice of energy minimization method is less important than the way scenes are modeled. Proximities are more valuable for segmentation than similarity in colors, while the benefit of 3D information is limited. It is also shown through practical experiments that, with implementations on GPUs, multiobject segmentation and tracking using state-of-art MRF inference methods is feasible, despite the computational costs typically associated with such methods.