This paper presents a method for optimally combining pixel information from an infra-red thermal imaging camera, and a conventional visible spectrum colour camera, for tracking a moving target. The tracking algorithm rapidly re-learns its background models for each camera modality from scratch at every frame. This enables, firstly, automatic adjustment of the relative importance of thermal and visible information in decision making, and, secondly, a degree of "camouflage target" tracking by continuously re-weighting the importance of those parts of the target model that are most distinct from the present background at each frame. Furthermore, this very rapid background adaptation ensures robustness to large, sudden and arbitrary camera motion, and thus makes this method a useful tool for robotics, for example visual servoing of a pan-tilt turret mounted on a moving robot vehicle. The method can be used to track any kind of arbitrarily shaped or deforming object, however the combination of thermal and visible information proves particularly useful for enabling robots to track people. The method is also important in that it can be readily extended for data fusion of an arbitrary number of statistically independent features from one or arbitrarily many imaging modalities.