Image Parsing: Unifying Segmentation, Detection, and Recognition

Tu, Zhuowen; Chen, Xiangrong; Yuille, Alan; Zhu, Song‐Chun

doi:10.1007/s11263-005-6642-x

Cited by 474 publications

(367 citation statements)

References 36 publications

Supporting

Mentioning

364

Contrasting

Order By: Relevance

“…Compared with past computational modeling, previous models have used part-based object recognition (31)(32)(33) and combined BU with TD processing (18,19,(34)(35)(36). However, past models did not study or report results on part recognition, did not examine the limitations of feed-forward models for part recognition, and did not demonstrate the contribution of a fast TD process to part detection and localization.…”

Section: Discussionmentioning

confidence: 99%

Image interpretation by a single bottom-up top-down cycle

Epshtein

Lifshitz

Ullman

2008

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

The human visual system recognizes objects and their constituent parts rapidly and with high accuracy. Standard models of recognition by the visual cortex use feed-forward processing, in which an object's parts are detected before the complete object. However, parts are often ambiguous on their own and require the prior detection and localization of the entire object. We show how a cortical-like hierarchy obtains recognition and localization of objects and parts at multiple levels nearly simultaneously by a single feed-forward sweep from low to high levels of the hierarchy, followed by a feedback sweep from high-to low-level areas.computer vision ͉ object recognition ͉ parts interpretation ͉ cortical hierarchy ͉ feedback processing I n the course of visual object recognition, we quickly recognize not only complete objects but also parts and subparts at different levels of detail. Hierarchical models of the visual cortex (1-3) typically perform recognition in a feed-forward manner in which recognition proceeds from the detection of simple features to more complex parts to the full object. However, the recognition of local parts is often ambiguous and depends on the object's context (Fig. 1), which is not available during feedforward processing.Psychological studies have also shown that the identification of a global shape and its local components proceed at similar speeds. Depending on the configuration, the global shape can either precede or follow the recognition of its local parts, and both contribute to final recognition (4, 5). Event-related potential (ERP) (6) and magnetoencephalography (MEG) (7) recordings have shown fast responses to both objects and parts, and physiological studies found that shape selectivity at different cortical levels emerges quickly and can sometimes further increase over a short time interval (8-11).We show below how objects and their multilevel components can be detected by the cortical hierarchy efficiently and almost simultaneously, even when the local parts on their own are highly ambiguous. Unlike feed-forward models, the basic computation is a particular bottom-up (BU) top-down (TD) cycle. Feedforward recognition was shown in past modeling to produce fast effective top-level recognition. However, we show that even when correct recognition is obtained by the BU pass, frequent errors occur at the parts level. A single TD pass is sufficient to correct almost all errors made during the BU pass, and the full cycle obtains not only object recognition but a detailed interpretation of the entire figure at multiple levels of details. We first describe below the computational model used for object and part recognition and then report testing results on natural images.Bidirectional Hierarchical Model. In this section, we consider the problem of detecting an object C together with a set P of parts

show abstract

Section: Discussionmentioning

confidence: 99%

Image interpretation by a single bottom-up top-down cycle

Epshtein

Lifshitz

Ullman

2008

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

show abstract

“…The large diversity of image segment types has increased the urge to devise a unified segmentation approach. Tu et al [13] provided such a unified probabilistic framework, which enables to "plug-in" a wide variety of parametric models capturing different segment types. While their framework elegantly unifies these parametric models, it is restricted to a predefined set of segment types, and each specific object/segment type (e.g., faces, text, texture etc.)…”

Section: Fig 3 Notationsmentioning

confidence: 99%

“…Our work builds on top of [14], providing a general segment quality score and a corresponding image segmentation algorithm, which applies to a large diversity of segment types, and can be applied for various segmentation tasks. Although general, our unified segmentation framework does not require any pre-definition or modelling of segment types (in contrast to the unified framework of [13]). …”

Section: Basic Concept -"Segmentation By Composition"mentioning

confidence: 99%

What Is a Good Image Segment? A Unified Approach to Segment Extraction

Bagon

Boiman

Irani

2008

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. There is a huge diversity of definitions of "visually meaningful" image segments, ranging from simple uniformly colored segments, textured segments, through symmetric patterns, and up to complex semantically meaningful objects. This diversity has led to a wide range of different approaches for image segmentation. In this paper we present a single unified framework for addressing this problem -"Segmentation by Composition". We define a good image segment as one which can be easily composed using its own pieces, but is difficult to compose using pieces from other parts of the image. This non-parametric approach captures a large diversity of segment types, yet requires no pre-definition or modelling of segment types, nor prior training. Based on this definition, we develop a segment extraction algorithm -i.e., given a single point-ofinterest, provide the "best" image segment containing that point. This induces a figure-ground image segmentation, which applies to a range of different segmentation tasks: single image segmentation, simultaneous co-segmentation of several images, and class-based segmentations.

show abstract

“…As can be seen, the top-down segmentation is better than any of the bottom-up segmentations but still misses important details. In recent years, several authors have therefore suggested combining top-down and bottom-up segmentation [2,21,17,6]. Borenstein et al [2] choose among a discrete set of possible low-level segmentations by minimizing a cost function that includes a bias towards the top-down segmentation.…”

Section: Introductionmentioning

confidence: 99%

“…Borenstein et al [2] choose among a discrete set of possible low-level segmentations by minimizing a cost function that includes a bias towards the top-down segmentation. In the image parsing framework of Tu et al [17] object-specific detectors serve as a proposal distribution for a data-driven Monte-Carlo sampling over possible segmentations. In the OBJ-CUT algorithm [6] a layered pictorial structure is used to define a bias term for a graph-cuts energy minimization algorithm (the energy favors segmentation boundaries occurring at image discontinuities).…”

Section: Introductionmentioning

confidence: 99%

Learning to Combine Bottom-Up and Top-Down Segmentation

Levin

Weiss

2008

Int J Comput Vis

View full text Add to dashboard Cite

Abstract. Bottom-up segmentation based only on low-level cues is a notoriously difficult problem. This difficulty has lead to recent top-down segmentation algorithms that are based on class-specific image information. Despite the success of top-down algorithms, they often give coarse segmentations that can be significantly refined using low-level cues. This raises the question of how to combine both top-down and bottom-up cues in a principled manner. In this paper we approach this problem using supervised learning. Given a training set of ground truth segmentations we train a fragment-based segmentation algorithm which takes into account both bottom-up and top-down cues simultaneously, in contrast to most existing algorithms which train top-down and bottom-up modules separately. We formulate the problem in the framework of Conditional Random Fields (CRF) and derive a feature induction algorithm for CRF, which allows us to efficiently search over thousands of candidate fragments. Whereas pure top-down algorithms often require hundreds of fragments, our simultaneous learning procedure yields algorithms with a handful of fragments that are combined with low-level cues to efficiently compute high quality segmentations.

show abstract

Image Parsing: Unifying Segmentation, Detection, and Recognition

Cited by 474 publications

References 36 publications

Image interpretation by a single bottom-up top-down cycle

Image interpretation by a single bottom-up top-down cycle

What Is a Good Image Segment? A Unified Approach to Segment Extraction

Learning to Combine Bottom-Up and Top-Down Segmentation

Contact Info

Product

Resources

About