The problem of estimating the 3D shape of human faces from single images is of great interest and has attracted considerable research effort. Many approaches recently proposed to solve this problem could be considered extensions of Shape-from-Shading (SFS) methods, where a 3D shape is optimized to generate 2D renderings that match the input images [1,5,7]. Other methods in the literature propose to infer 3D face shape by fitting a set of feature points between the 2D image and the 3D model [3,4,6].In this paper, we propose the Two-Fold Coupled Structure Learning (2FCSL) algorithm, which is capable of reconstructing 3D face models based on a sparse set of 2D landmarks that could be localized automatically by most of the recently proposed landmark detectors. By explicitly incorporating 3D-2D pose estimation and formulating the problem into a two-fold coupled structure learning problem, our method achieves better robustness to arbitrary pose variations and landmark localization noise.Using a shape vector representation Y i 3D of the dense 3D face, N 3D training faces are stacked together to construct the 3D dense land-is the vector representation of M 3D landmarks. Given a 2D image, a sparse set of landmarks X I 2D is first detected with any off-the-shelf detector. Then, the 3D-2D projection matrix P is estimated using least squares minimization, such that X I 2D = PX 3D , whereX 3D is the mean of 3DSLs in the training database. By projecting each 3DSL via P, the corresponding 2D sparse landmark (2DSL) model χ s 2D =(X 1 2D , · · · , X N 2D ), where X i 2D is the vector representation of M 2D landmarks, is generated on-line.By applying PCA to the 3DSL and the 2DSL models, we derive a compact representations of the corresponding shapes A m and A n , based on which a PLS regression P PLS [2] is learned, m = A n P PLS :Following the same procedure, we compute the compact representation of X I 2D by solving for a I n = U sThen the a I m is recovered by a I m = a I n P PLS and the 3DSL is constructed through X R 3D = X 3D + a I m U s 3D . After we obtain the 3DSL X R 3D , we aim to reconstruct the 3DDL Y R 3D . In the training phase, the correlation between 3DSL and 3DDL is implicitly learned in a coupled manner. arg min
Facial landmark localization is a fundamental module for pose-invariant face recognition. The most common approach for facial landmark detection is cascaded regression, which is composed of two steps: feature extraction and facial shape regression. Recent methods employ deep convolutional networks to extract robust features for each step, while the whole system could be regarded as a deep cascaded regression architecture. In this work, instead of employing a deep regression network, a Globally Optimized Dual-Pathway (GoDP) deep architecture is proposed to identify the target pixels through solving a cascaded pixel labeling problem without resorting to high-level inference models or complex stacked architecture. The proposed end-to-end system relies on distance-aware softmax functions and dual-pathway proposal-refinement architecture. Results show that it outperforms the state-of-the-art cascaded regression-based methods on multiple in-the-wild face alignment databases. The model achieves 1.84 normalized mean error (NME) on the AFLW database [1], which outperforms 3DDFA [2] by 61.8%. Experiments on face identification demonstrate that GoDP, coupled with DPM-headhunter [3], is able to improve rank-1 identification rate by 44.2% compare to Dlib [4] toolbox on a challenging database. Recent works in human pose estimation [14, 15, 16] employ 2D score maps as the targets for inference. This modification enables gradients back-propagation between stages, allows 2D feedback loops, and hence delivers an end-to-end model. In this new family of methods for key-points localization, the works of Peng [16] et al. and Xiao [32] et al. rely on a DeconvNet [31] architecture to localize facial landmarks.Even though they obtain impressive results by integrating the estimation with recurrent neural networks, the quality of face alignment is intrinsically limited by the lowquality confidence map generated by the DeconvNet as shown in Fig. 1. Wei et al. [15] proposed the convolutional pose machine (CPM), which employs a stacked cascaded architecture to refine body key-point predictions. This cascaded structure has multi-
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.