TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation

Abstract. We propose an algorithm for semantic segmentation based on 3D point clouds derived from ego-motion. We motivate five simple cues designed to model specific patterns of motion and 3D world structure that vary with object category. We introduce features that project the 3D cues back to the 2D image plane while modeling spatial layout and context. A randomized decision forest combines many such features to achieve a coherent 2D segmentation and recognize the object categories present. Our main contribution is to show how semantic segmentation is possible based solely on motion-derived 3D world structure. Our method works well on sparse, noisy point clouds, and unlike existing approaches, does not need appearance-based descriptors. Experiments were performed on a challenging new video database containing sequences filmed from a moving car in daylight and at dusk. The results confirm that indeed, accurate segmentation and recognition are possible using only motion and 3D world structure. Further, we show that the motion-derived information complements an existing state-of-the-art appearance-based method, improving both qualitative and quantitative performance.input video frame reconstructed 3D point cloud automatic segmentation Fig. 1. The proposed algorithm uses 3D point clouds estimated from videos such as the pictured driving sequence (with ground truth inset). Having trained on point clouds from other driving sequences, our new motion and structure features, based purely on the point cloud, perform 11-class semantic segmentation of each test frame. The colors in the ground truth and inferred segmentation indicate category labels.

show abstract

“…We extend the features suggested in [27] to project our cues from the 3D point cloud to the 2D image plane, illustrated in Fig. 3.…”

Section: Projecting From 3d To 2dmentioning

confidence: 99%

Segmentation and Recognition Using Structure from Motion Point Clouds

Brostow

Shotton

Fauqueur³

et al. 2008

Lecture Notes in Computer Science

Self Cite

782

588

View full text Add to dashboard Cite

show abstract

“…Our work has also been provoked by the idea of multi-class object detection proposed for general computer vision problems [6,7,8]. Different from these methods, we design three levels of features to exploit the specific characteristics of PET-CT thoracic images, and a different multi-level discriminative model for more effective inference of the pathological context.…”

Section: Introductionmentioning

confidence: 99%

Discriminative Pathological Context Detection in Thoracic Images Based on Multi-level Inference

Song

Cai

Eberl

et al. 2011

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Positron emission tomography -computed tomography (PET-CT) is now accepted as the best imaging technique to accurately stage lung cancer. The consistent and accurate interpretation of PET-CT images, however, is not a trivial task. We propose a discriminative, multi-level learning and inference method to automatically detect the pathological contexts in the thoracic PET-CT images, i.e. the primary tumor and its spatial relationships within the lung and mediastinum, and disease in regional lymph nodes. The detection results can also be used as features to retrieve similar images with previous diagnosis from an imaging database as a reference set to aid physicians in PET-CT scan interpretation. Our evaluation with clinical data from lung cancer patients suggests our approach is highly accurate.

show abstract

“…we compare the performance of the proposed system to the state-of the art recent work [7,8,13]. Notice that they [7,8,13] use much more sophisticated machine learning algorithms and features.…”

Section: Comparison To Previous Workmentioning

confidence: 99%

“…Notice that they [7,8,13] use much more sophisticated machine learning algorithms and features. Nevertheless our overall performance is not that much worse, which is surprising, given the simplicity of our classi er.…”

Section: Comparison To Previous Workmentioning

confidence: 99%

Object Class Segmentation Using Reliable Regions

Vakili

Veksler

2011

Computer Vision – ACCV 2010

View full text Add to dashboard Cite

Abstract. Image segmentation is increasingly used for object recognition. The advantages of segments are numerous: a natural spatial support to compute features, reduction in the number of hypothesis to test, region shape itself can be a useful feature, etc. Since segmentation is brittle, a popular remedy is to integrate results over multiple segmentations of the scene. In previous work, usually all the regions in multiple segmentations are used. However, a typical segmentation algorithm often produces generic regions lacking discriminating features. In this work we explore the idea of nding and using only the regions that are reliable for detection. The main step is to cluster feature vectors extracted from regions and deem as unreliable any clusters that belong to di erent classes but have a signi cant overlap. We use a simple nearest neighbor classi er for object class segmentation and show that discarding unreliable regions results in a signi cant improvement.

show abstract

TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation

Cited by 903 publications

References 17 publications

Segmentation and Recognition Using Structure from Motion Point Clouds

Segmentation and Recognition Using Structure from Motion Point Clouds

Discriminative Pathological Context Detection in Thoracic Images Based on Multi-level Inference

Object Class Segmentation Using Reliable Regions

Contact Info

Product

Resources

About