PostprintThis is the accepted version of a paper published in IEEE transactions on circuits and systems for video technology (Print). This paper has been peer-reviewed but does not include the final publisher proofcorrections or journal pagination.Citation for the original published paper (version of record): Ros, E. (2015) Real-time Pose Detection and Tracking of Hundreds of Objects.
IEEE transactions on circuits and systems for video technology (Print)Access to the published version may require subscription. Abstract-We propose a novel model-based method for tracking the six-degrees-of-freedom (6DOF) pose of a very large number of rigid objects in real-time. By combining dense motion and depth cues with sparse keypoint correspondences, and by feeding back information from the modeled scene to the cue extraction process, the method is both highly accurate and robust to noise and occlusions. A tight integration of the graphical and computational capability of graphics processing units (GPUs) allows the method to simultaneously track hundreds of objects in real-time. We achieve pose updates at framerates around 40 Hz when using 500,000 data samples to track 150 objects using images of resolution 640×480. We introduce a synthetic benchmark dataset with varying objects, background motion, noise and occlusions that enables the evaluation of stereo-visionbased pose estimators in complex scenarios. Using this dataset and a novel evaluation methodology, we show that the proposed method greatly outperforms state-of-the-art methods. Finally, we demonstrate excellent performance on challenging real-world sequences involving multiple objects being manipulated.