The computer graphics and vision communities have dedicated long standing efforts in building computerized tools for reconstructing, tracking, and analyzing human faces based on visual input. Over the past years rapid progress has been made, which led to novel and powerful algorithms that obtain impressive results even in the very challenging case of reconstruction from a single RGB or RGB‐D camera. The range of applications is vast and steadily growing as these technologies are further improving in speed, accuracy, and ease of use.
Motivated by this rapid progress, this state‐of‐the‐art report summarizes recent trends in monocular facial performance capture and discusses its applications, which range from performance‐based animation to real‐time facial reenactment. We focus our discussion on methods where the central task is to recover and track a three dimensional model of the human face using optimization‐based reconstruction algorithms. We provide an in‐depth overview of the underlying concepts of real‐world image formation, and we discuss common assumptions and simplifications that make these algorithms practical. In addition, we extensively cover the priors that are used to better constrain the under‐constrained monocular reconstruction problem, and discuss the optimization techniques that are employed to recover dense, photo‐geometric 3D face models from monocular 2D data. Finally, we discuss a variety of use cases for the reviewed algorithms in the context of motion capture, facial animation, as well as image and video editing.