PurposeThe emerging field of educational data mining provides an opportunity to process large-scale data emerging from higher education institutions (HEIs) into reliable knowledge. The purpose of this paper is to examine factors influencing persistence amongst students enrolled in a Chemistry major at a South African university using enrolment data.Design/methodology/approachThe sample consisted of 511 students registered for a Chemistry major beginning in 2012, 2013 and 2014. Descriptive statistics in counts and percentages and classification (decision) tree methods were used in the analysis.FindingsGraduation from the Chemistry major is likely to occur after 4 years, which is regulation time plus 1 year, whilst departure mainly occurs in the first year of study. Classification tree modelling demonstrated that first year accumulated credits (FYAC), gender, financial aid status and school quintile were the factors associated with persistence. FYAC was the most critical factor.Research limitations/implicationsAlthough this study has many strengths, significantly the use of data mining methods to classify students, some limitations might affect how the results are interpreted. First, the analysis focused on a one-degree major in one institution, which leads to the suspicion that the observed results are discipline or institution-specific. Thus, the findings cannot be generalised to other contexts or disciplines. Second, with so many potential factors influencing student persistence, the analysis presented in this paper, which was limited to the covariates obtained in the institutional dataset the authors used, is by no means exhaustive. There is the possibility that some factors, which are not included in the present analysis, might have more predictive power.Originality/valueGlobally, university administrators are interested in predicting student outcomes and understanding the intricate balance between enrolment and throughput. Thus, whilst the findings from this study have an institutional focus, they resonate with other HEIs and present an alternative and highly visual way of identifying specific combinations of factors associated with persistence. The results from a classification tree model can also classify students at risk and inform the development of interventions that will support them.