This chapter introduces and analyzes a method for registering multimodal images with occluding objects in the scene. An analysis of multimodal image registration gives insight into the limitations of assumptions made in current approaches and motivates the methodology of the developed algorithm. Using calibrated stereo imagery, we use maximization of mutual information in sliding correspondence windows that inform a disparity voting algorithm to demonstrate successful registration of objects in color and thermal imagery where there is significant occlusion. Extensive testing of scenes with multiple objects at different depths and levels of occlusion shows high rates of successful registration. Ground truth experiments demonstrate the utility of disparity voting techniques for multimodal registration by yielding qualitative and quantitative results that outperform approaches that do not consider occlusions. A framework for tracking with the registered multimodal features is also presented and experimentally validated.
IntroductionComputer vision applications are increasingly using multimodal imagery to obtain and process information about a scene. Specifically, the disparate yet complementary nature of visual and thermal imagery has been used in recent works to obtain additional information and robustness [1,2]. The use of both types of imagery yields information about the scene that is rich in color, depth, motion, and thermal detail. Such information can then be used to successfully detect, track, and analyze people and objects in the scene.To associate the information from each modality, corresponding data in each image must be successfully registered. In long-range surveillance applications [2], the cameras are assumed to be oriented in such a way that a global alignment R.I. Hammoud (ed.), Augmented Vision Perception in Infrared: Algorithms and 321