We propose a new algorithm for sparse estimation of eigenvectors in generalized eigenvalue problems (GEP). The GEP arises in a number of modern data-analytic situations and statistical methods, including principal component analysis (PCA), multiclass linear discriminant analysis (LDA), canonical correlation analysis (CCA), sufficient dimension reduction (SDR) and invariant co-ordinate selection. We propose to modify the standard generalized orthogonal iteration with a sparsity-inducing penalty for the eigenvectors. To achieve this goal, we generalize the equation-solving step of orthogonal iteration to a penalized convex optimization problem. The resulting algorithm, called penalized orthogonal iteration, provides accurate estimation of the true eigenspace, when it is sparse. Also proposed is a computationally more efficient alternative, which works well for PCA and LDA problems. Numerical studies reveal that the proposed algorithms are competitive, and that our tuning procedure works well. We demonstrate applications of the proposed algorithm to obtain sparse estimates for PCA, multiclass LDA, CCA and SDR. Supplementary materials are available online. et al. (2015) later proposed several approximate solutions of (3). Recently, Tan et al. (2016) proposed to solve (2), by truncating the steepest ascent iterates in maximizing the Rayleigh coefficient u → u T Au/u T Bu. Gaynanova et al. (2017) pointed out a fundamental difference between the penalized and constrained optimizations for sparse GEP, similar to (2) and (3) but with 1 -norm. Safo et al. (2018) proposed to estimate u via minimizing u 1 subject to a constraint Aũ −λBu ∞ ≤ ρ, where (λ,ũ) is the non-sparse solution of (1). As it is evident in Sriperumbudur et al. (2011), Song et al. (2015), and Tan et al. (2016) who limit themselves for solving only one eigen-pair, we are unclear how (2) or (3) generalizes to simultaneously solving for multiple eigenvectors, u 1 , . . . , u d . When multiple eigenvectors are needed, as is typical in practice, these methods are not readily applicable, at least not without a clever modification. Our algorithm is designed to estimate u 1 , . . . , u d altogether, and works well when d > 1.Han and Clemmensen (2016) assumed B to be positive definite, and transformed the GEP to a regular eigen-decomposition of B −1 A (or B −1/2 AB −1/2 ) while applying an 1penalty to achieve sparsity. However, their method is not directly applicable to the largep-small-n-case, due to the numerically unstable inverse of the large matrix B. They used alternating direction method of multipliers for optimization, which causes their method to be computationally expensive. Chen et al. (2010) proposed to solve sufficient dimension reduction (SDR) problems by maximizing trace(U T AU) − ρ λ (U), for U ∈ R p×d satisfying U T BU = I d , in which the penalty function ρ λ enforces coordinate-wise sparsity (10). While Chen et al. (2010)'sformulation is similar to our Fast POI with (10), their computation is much slower than any of our proposed algorithms, perha...