Detachable Object Detection: Segmentation and Depth Ordering from Short-Baseline Video

Ayvaci, Alper; Soatto, Stefano

doi:10.1109/tpami.2011.271

Cited by 27 publications

(29 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This approach only exploits the appearance cues, is applicable to a single view setting and more suitable in the context of image based retrieval applications, where images of scenes are well composed, containing little clutter. Our approach is motivated by work of [3], which explicitly reasons about evidence of occlusions boundaries extracted from optical flow and relative depth ordering cues. Also related to our work are several attempts to discover objects in urban scenes.…”

Section: Related Workmentioning

confidence: 99%

Recursive Inference for Prediction of Objects in Urban Environments

Cadena

Košecká

2016

Springer Tracts in Advanced Robotics

View full text Add to dashboard Cite

Future advancements in robotic navigation and mapping rest to a large extent on robust, efficient and more advanced semantic understanding of the surrounding environment. The existing semantic mapping approaches typically consider small number of semantic categories, require complex inference or large number of training examples to achieve desirable performance. In the proposed work we present an efficient approach for predicting locations of generic objects in urban environments by means of semantic segmentation of a video into object and nonobject categories. We exploit widely available exemplars of non-object categories (such as road, buildings, vegetation) and use geometric cues which are indicative of the presence of object boundaries to gather the evidence about objects regardless of their category. We formulate the object/non-object semantic segmentation problem in the Conditional Random Field framework, where the structure of the graph is induced by a minimum spanning tree computed over a 3D point cloud, yielding an efficient algorithm for an exact inference. The chosen 3D representation naturally lends itself for on-line recursive belief updates with a simple soft data association mechanism. We carry out extensive experiments on videos of urban environments acquired by a moving vehicle and show quantitatively and qualitatively the benefits of our proposal.

show abstract

Section: Related Workmentioning

confidence: 99%

Recursive Inference for Prediction of Objects in Urban Environments

Cadena

Košecká

2016

Springer Tracts in Advanced Robotics

View full text Add to dashboard Cite

show abstract

“…Southey and Little [8] provide another example of a live-video system, combining stereo vision with optical flow techniques to segment manipulable objects in video, and visual features to group these segments. Ayvaci and Soatto [9] use motion in video to find occlusion cues which are integrated to partition the image into depth layers. Sivic et al [10] do frame-to-frame tracking in video, and aggregate groups of points that move together to segment objects.…”

Section: Prior Workmentioning

confidence: 99%

Object disappearance for object discovery

Mason

Marthi

Parr

2012

2012 IEEE/RSJ International Conference on Intelligent Robots and Systems

View full text Add to dashboard Cite

Abstract-A useful capability for a mobile robot is the ability to recognize the objects in its environment that move and change (as distinct from background objects, which are largely stationary). This ability can improve the accuracy and reliability of localization and mapping, enhance the ability of the robot to interact with its environment, and facilitate applications such as inventory management and theft detection. Rather than viewing this task as a difficult application of object recognition methods from computer vision, this work is in line with a recent trend in the community towards unsupervised object discovery and tracking that exploits the fundamentally temporal nature of the data acquired by a robot. Unlike earlier approaches, which relied heavily upon computationally intensive techniques from mapping and computer vision, our approach combines visual features and RGB-D data in a simple and effective way to segment objects from robot sensory data. We then use a Dirichlet process to cluster and recognize objects. The performance of our approach is demonstrated in several test domains.

show abstract

“…Gestalt principles [33] provide grouping criteria: continuity, regularity, proximity, compactness, the last of which (figure/ground, or occlusion) is best informed by video. Occlusions have been used extensively for grouping [32,5,8,3]. A feature of [3] is that grouping is obtained via a linear program: local ordering constraints provided by occluder/occluded relations are integrated to globally partition the image domain into depth layers.…”

Section: Introductionmentioning

confidence: 99%

Causal video object segmentation from persistence of occlusions

Taylor

Karasev

Soattoc

2015

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

122

View full text Add to dashboard Cite

Figure 1: Sample outcomes of our scheme: background c(x) = 0 (gray) and foreground layers c(x) = 1, c(x) = 2, c(x) = 3 indicated by , , respectively. On the far right, our algorithm correctly infers that the bag strap is in front of the woman's arm, which is in front of her trunk, which is in front of the background. Project page: http://vision.ucla.edu/cvos/ AbstractOcclusion relations inform the partition of the image domain into "objects" but are difficult to determine from a single image or short-baseline video. We show how long-term occlusion relations can be robustly inferred from video, and used within a convex optimization framework to segment the image domain into regions. We highlight the challenges in determining these occluder/occluded relations and ensuring regions remain temporally consistent, propose strategies to overcome them, and introduce an efficient numerical scheme to perform the partition directly on the pixel grid, without the need for superpixelization or other preprocessing steps.

show abstract

Detachable Object Detection: Segmentation and Depth Ordering from Short-Baseline Video

Cited by 27 publications

References 42 publications

Recursive Inference for Prediction of Objects in Urban Environments

Recursive Inference for Prediction of Objects in Urban Environments

Object disappearance for object discovery

Causal video object segmentation from persistence of occlusions

Contact Info

Product

Resources

About