This paper presents a statistical face recognition algorithm by expressing face images in terms of orthogonal two-dimensional Gaussian-Hermite moments (2D-GHMs). Motivation for developing 2D-GHM-based recognition algorithm includes the ability of these moments to capture higher-order hidden nonlinear 2D structures within images and the invariance of certain linear combinations of moments to common geometric distortions in images. The key contribution of this paper is that features of 2D faces are represented in terms of a statistically selected set of 2D discriminative GHMs (DGHMs) as opposed to commonly chosen heuristic set of first few order moments only. In particular, the intraclass correlation coefficient for the entire set of moments of the training images are used to select only a desired set of moments that maximize the discrimination among available classes. The naive Bayes classifier that yields optimal performance in many statistical applications is used for identification due to the simplicity of its implementation for handling huge size face database. Experiments are conducted to evaluate the performance of the proposed recognition algorithm on exhaustive databases such as the AT&T, Face Recognition Grand Challenge (FRGC), Face Recognition Technology (FERET), Labeled Faces in the Wild (LFW), and YouTube that possess face images or videos with significant variations in terms of appearance, occlusion, expression, pose, resolution, and illumination both in the constrained and unconstrained environments. In the constrained condition, comparisons with the well-established 2D-principal component analysis, 2D-linear discriminant analysis, and 2D-canonical correlation analysis methods as well as orthogonal 2D-Krawtchouk moment-based method reveal the superior performance of the proposed method in terms of recognition accuracy for varying numbers of training and probe images. The proposed DGHM features also show superior recognition or verification performance on the standard protocols of the unconstrained face databases when comparing with the commonly referred descriptors such as the local binary pattern or scale-invariant feature transform.