Recently, stream data mining applications has drawn vital attention from several research communities. Stream data is continuous form of data which is distinguished by its online nature. Traditionally, machine learning area has been developing learning algorithms that have certain assumptions on underlying distribution of data such as data should have predetermined distribution. Such constraints on the problem domain lead the way for development of smart learning algorithms performance is theoretically verifiable. Real-word situations are different than this restricted model. Applications usually suffers from problems such as unbalanced data distribution. Additionally, data picked from non-stationary environments are also usual in real world applications, resulting in the "concept drift" which is related with data stream examples. These issues have been separately addressed by the researchers, also, it is observed that joint problem of class imbalance and concept drift has got relatively little research. If the final objective of clever machine learning techniques is to be able to address a broad spectrum of real world applications, then the necessity for a universal framework for learning from and tailoring (adapting) to, environment where drift in concepts may occur and unbalanced data distribution is present can be hardly exaggerated. In this paper, we first present an overview of issues that are observed in stream data mining scenarios, followed by a complete review of recent research in dealing with each of the issue.
It is necessary to use Student dataset in order to analyze student's performance for future improvements in study methods and overall curricular. Incremental learning methods are becoming popular nowadays since amount of data and information is rising day by day. There is need to update classifier in order to scale up learning to manage more training data. Incremental learning technique is a way in which data is processed in chunks and the results are merged so as to possess less memory. For this reason, in this paper, four classifiers that can run incrementally: the Naive Bayes, KStar, IBK and Nearest neighbor (KNN) have been compared. It is observed that nearest neighbor algorithm gives better accuracy compared to others if applied on Student Evaluation dataset which has been used.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.