The highest quality 3D face reconstructions are produced using multi-view stereo methods, reporting errors below 0.5mm. Unfortunately, these methods typically employ dozens of high-resolution cameras in a large laboratory capture gantry. In contrast, monocular 3D face reconstruction using sophisticated deep learning models are suited for casual mobile phone imaging outside the lab and report a mean error of 1-2mm.This paper investigates whether classic stereo methods can be used in scenarios with only a few low-resolution images available. We expect to find that it cannot since multiview stereo performs well only when many high-resolution images are provided. When only two low-resolution images are available, stereo produces very noisy results which are not directly usable. Surprisingly, however, our analysis shows that this visually noisy data has lower error than comparison state-of-the-art methods. We find that the visual artifacts from stereo can be removed using a morphable face model to constrain face shape.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.