Multi-primitive hierarchical (MPH) stereo analysis

Marapane, Suresh B.; Trivedi, Mohan M.

doi:10.1109/34.276122

Cited by 69 publications

(27 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While this may be acceptable for registration alignment, refinement steps are necessary to make disparity a more robust tracking feature. Approaches that use multiple primitives [24], such as edges, shapes, silhouettes, and the like, could be used to augment the accuracy of the DV algorithm. In addition, using multiple tracking features could provide additional measurements that can be used to boost the association accuracy.…”

Section: Multimodal Video Analysis For Person Tracking: Basic Framewomentioning

confidence: 99%

Registering Multimodal Imagery with Occluding Objects Using Mutual Information: Application to Stereo Tracking of Humans

Krotosky

Trivedi

Augmented Vision Perception in Infrared

View full text Add to dashboard Cite

This chapter introduces and analyzes a method for registering multimodal images with occluding objects in the scene. An analysis of multimodal image registration gives insight into the limitations of assumptions made in current approaches and motivates the methodology of the developed algorithm. Using calibrated stereo imagery, we use maximization of mutual information in sliding correspondence windows that inform a disparity voting algorithm to demonstrate successful registration of objects in color and thermal imagery where there is significant occlusion. Extensive testing of scenes with multiple objects at different depths and levels of occlusion shows high rates of successful registration. Ground truth experiments demonstrate the utility of disparity voting techniques for multimodal registration by yielding qualitative and quantitative results that outperform approaches that do not consider occlusions. A framework for tracking with the registered multimodal features is also presented and experimentally validated. IntroductionComputer vision applications are increasingly using multimodal imagery to obtain and process information about a scene. Specifically, the disparate yet complementary nature of visual and thermal imagery has been used in recent works to obtain additional information and robustness [1,2]. The use of both types of imagery yields information about the scene that is rich in color, depth, motion, and thermal detail. Such information can then be used to successfully detect, track, and analyze people and objects in the scene.To associate the information from each modality, corresponding data in each image must be successfully registered. In long-range surveillance applications [2], the cameras are assumed to be oriented in such a way that a global alignment R.I. Hammoud (ed.), Augmented Vision Perception in Infrared: Algorithms and 321

show abstract

Section: Multimodal Video Analysis For Person Tracking: Basic Framewomentioning

confidence: 99%

Registering Multimodal Imagery with Occluding Objects Using Mutual Information: Application to Stereo Tracking of Humans

Krotosky

Trivedi

Augmented Vision Perception in Infrared

View full text Add to dashboard Cite

show abstract

“…This can be done only approximately: ambiguity arising from such factors as noise, periodicity, and large regions of constant intensity makes it impossible to identify all locations in the two images with certainty. There has been much work on stereo (Ayache, 1991;Grimson, 1981;Marapane & Trivedi, 1994). The issues in solving this problem include i how the geometry and calibration of the stereo system are determined, ii what primitives are matched between the two images, iii what a priori assumptions are made about the scene to determine the disparity, iv how the whole correspondence, i.e.…”

Section: Introductionmentioning

confidence: 99%

Local Feature Selection and Global Energy Optimization in Stereo

Ishikawa¹,

Geiger²

2007

Scene Reconstruction Pose Estimation and Tracking

View full text Add to dashboard Cite

“…[1]. However, 3-D space point reconstruction from conventional stereo vision employing fixed spatial configurations of cameras has been plagued by several problems: (a) there is an exponential increase of depth estimation error as the distance between space points and the camera increases (this problem is attributed to fixed baseline distance) [2]; (b) many mismatches occur during point correspondence computations especially when there are multiple similar feature points on the epipolar lines [3,4]; (c) periodic structures in the images often cause mis-matches; (d) occlusions are hard to detect and can cause mis-matches [5], and (e) with 2-camera stereo, only those edge points can be matched reliably that have a gradient vector roughly pointing in the direction of the baseline.…”

Section: Introductionmentioning

confidence: 99%