We develop a Bayesian modeling approach for tracking people in 3D from monocular video with unknown cameras. Modeling in 3D provides natural explanations for occlusions and smoothness discontinuities that result from projection, and allows priors on velocity and smoothness to be grounded in physical quantities: meters and seconds vs. pixels and frames. We pose the problem in the context of data association, in which observations are assigned to tracks. A correct application of Bayesian inference to multitarget tracking must address the fact that the model's dimension changes as tracks are added or removed, and thus, posterior densities of different hypotheses are not comparable. We address this by marginalizing out the trajectory parameters so the resulting posterior over data associations has constant dimension. This is made tractable by using (a) Gaussian process priors for smooth trajectories and (b) approximately Gaussian likelihood functions. Our approach provides a principled method for incorporating multiple sources of evidence; we present results using both optical flow and object detector outputs. Results are comparable to recent work on 3D tracking and, unlike others, our method requires no pre-calibrated cameras.
We present a general model for tracking smooth trajectories of multiple targets in complex data sets, where tracks potentially cross each other many times. As the number of overlapping trajectories grows, exploiting smoothness becomes increasingly important to disambiguate the association of successive points. However, in many important problems an effective parametric model for the trajectories does not exist. Hence we propose modeling trajectories as independent realizations of Gaussian processes with kernel functions which allow for arbitrary smooth motion. Our generative statistical model accounts for the data as coming from an unknown number of such processes, together with expectations for noise points and the probability that points are missing.For inference we compare two methods: A modified version of the Markov chain Monte Carlo data association (MCMCDA) method, and a Gibbs sampling method which is much simpler and faster, and gives better results by being able to search the solution space more efficiently. In both cases, we compare our results against the smoothing provided by linear dynamical systems (LDS).We test our approach on videos of birds and fish, and on 82 image sequences of pollen tubes growing in a petri dish, each with up to 60 tubes with multiple crossings. We achieve 93% accuracy on image sequences with up to ten trajectories (35 sequences) and 88% accuracy when there are more than ten (42 sequences). This performance surpasses that of using an LDS motion model, and far exceeds a simple heuristic tracker.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.