We present a computationally efficient approach to perform large-scale all-electron density functional theory calculations by enriching the classical finite element basis with compactly supported atom-centered numerical basis functions that are constructed from the solution of the Kohn-Sham (KS) problem for single atoms. We term these numerical basis functions as enrichment functions, and the resultant basis as the enriched finite element basis. The compact support for the enrichment functions is obtained by using smooth cutoff functions, which enhances the conditioning and maintains the locality of the enriched finite element basis. The integrals involved in the evaluation of the discrete KS Hamiltonian and overlap matrix in the enriched finite element basis are computed using an adaptive quadrature grid that is constructed based on the characteristics of enrichment functions. Further, we propose an efficient scheme to invert the overlap matrix by using a block-wise matrix inversion in conjunction with special reduced-order quadrature rules, which is required to transform the discrete Kohn-Sham problem to a standard eigenvalue problem. Finally, we solve the resulting standard eigenvalue problem, in each self-consistent field iteration, by using a Chebyshev polynomial based filtering technique to compute the relevant eigenspectrum. We demonstrate the accuracy, efficiency and parallel scalability of the proposed method on semiconducting and heavymetallic systems of various sizes, with the largest system containing 8694 electrons. We obtain accuracies in the ground-state energies that are ∼ 1 mHa with reference ground-state energies employing classical finite element as well as gaussian basis sets. Using the proposed formulation based on enriched finite element basis, for accuracies commensurate with chemical accuracy, we observe a staggering 50−300 fold reduction in the overall computational time when compared to classical finite element basis. Further, we find a significant outperformance by the enriched finite element basis when compared to the gaussian basis for the modest system sizes where we obtained convergence with gaussian basis. We also observe good parallel scalability of the numerical implementation up to 384 processors for a representative benchmark system comprising of 280-atom silicon nano-cluster.