This is the accepted version of the paper.This version of the publication may differ from the final published version. Abstract: Hand pose is emerging as an important interface for human-computer interaction. This paper presents a data-driven method to estimate a high-quality depth map of a hand from a stereoscopic camera input by introducing a novel superpixelbased regression framework that takes advantage of the smoothness of the depth surface of the hand. To this end, we introduce Conditional Regressive Random Forest (CRRF), a method that combines a Conditional Random Field (CRF) and a Regressive Random Forest (RRF) to model the mapping from a stereo RGB image pair to a depth image. The RRF provides a unary term that adaptively selects different stereo-matching measures as it implicitly determines matching pixels in a coarse-to-fine manner. While the RRF makes depth prediction for each super-pixel independently, the CRF unifies the prediction of depth by modeling pair-wise interactions between adjacent superpixels. Experimental results show that CRRF can generate a depth image more accurately than the leading contemporary techniques using an inexpensive stereo camera.
Permanent repository link