Mining data streams is a challenging task that requires online systems based on incremental learning approaches. This paper describes a classification system based on decision rules that may store up-to-date border examples to avoid unnecessary revisions when virtual drifts are present in data. Consistent rules classify new test examples by covering and inconsistent rules classify them by distance as the nearest neighbor algorithm. In addition, the system provides an implicit forgetting heuristic so that positive and negative examples are removed from a rule when they are not near one another.
Mining data streams is a challenging task that requires online systems based on incremental learning approaches. This paper describes a classification system based on decision rules that may store up-to-date border examples to avoid unnecessary revisions when virtual drifts are present in data. Consistent rules classify new test examples by covering and inconsistent rules classify them by distance as the nearest neighbor algorithm. In addition, the system provides an implicit forgetting heuristic so that positive and negative examples are removed from a rule when they are not near one another.
This paper presents a scalable learning algorithm to classify numerical, low dimensionality, high-cardinality, time-changing data streams. Our approach, named SCALLOP, provides a set of decision rules on demand which improves its simplicity and helpfulness for the user. SCALLOP updates the knowledge model every time a new example is read, adding interesting rules and removing out-of-date rules. As the model is dynamic, it maintains the tendency of data. Experimental results with synthetic data streams show a good performance with respect to running time, accuracy and simplicity of the model.
Abstract.As an analysis of the classification accuracy bound for the Nearest Neighbor technique, in this work we have studied if it is possible to find a good value of the parameter k for each example according to their attribute values. Or at least, if there is a pattern for the parameter k in the original search space. We have carried out different approaches based on the Nearest Neighbor technique and calculated the prediction accuracy for a group of databases from the UCI repository. Based on the experimental results of our study, we can state that, in general, it is not possible to know a priori a specific value of k to correctly classify an unseen example.
Great organizations collect open-ended and time-changing data received at a high speed. The possibility of extracting useful knowledge from these potentially infinite databases is a new challenge in Data Mining. In this paper we propose an anytime incremental learning algorithm for mining numeric data streams. Within Supervised Learning, our approach is based on prototypes and hypercubic decision rules, concerning with the simplicity of the model provided and the time complexity as primary goals. Experimental results with synthetic databases of 100 gigabytes show a good performance from streams of data in continuous transformation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.