Abstract:In the recent years, data mining has been utilized in education settings for extracting and manipulating data, and for establishing patterns in order to produce useful information for decision making. There is a growing need for higher education institutions to be more informed and knowledgeable about their students, and for them to understand some of the reasons behind students' choice to enroll and pursue careers. One of the ways in which this can be done is for such institutions to obtain information and knowledge about their students by mining, processing and analyzing the data they accumulate about them. In this paper, we propose a general framework for mining student data enrolled in Science, Technology, Engineering and Mathematics (STEM) using performance weighted ensemble classifiers. We train an ensemble of classification models from enrollment data streams to improve the quality of student data by eliminating noisy instances, and hence improving predictive accuracy. We empirically compare our technique with single model based techniques and show that using ensemble models not only gives better predictive accuracies on student enrollment in STEM, but also provides better rules for understanding the factors that influence student enrollment in STEM disciplines.