In practical applications, data stream classification faces significant challenges, such as high cost of labeling instances and potential concept drifting. We present a new online active learning ensemble framework for drifting data streams based on a hybrid labeling strategy that includes the following: 1) an ensemble classifier, which consists of a long-term stable classifier and multiple dynamic classifiers (a multilevel sliding window model is used to create and update the dynamic classifiers to effectively process both the gradual drift type and sudden drift type data stream) and 2) active learning, which takes a nonfixed labeling budget, supports on-demand request labeling, and adopts an uncertainty strategy and random strategy to label instances. The decision threshold of the uncertainty strategy is adjusted dynamically, i.e., when concept drift occurs, the threshold is gradually reduced to query the most uncertain instances in priority to reduce the request expense as much as possible. Experiments on synthetic and real data sets show that precise prediction accuracy can be obtained by the proposed method without increasing the total cost of labeling, and that the labeling cost can be dynamically allocated according to the concept drift.
Machine learning in real-world scenarios is often challenged by concept drift and class imbalance. This paper proposes a Resample-based Ensemble Framework for Drifting Imbalanced Stream (RE-DI). The ensemble framework consists of a long-term static classifier to handle gradual and multiple dynamic classifiers to handle sudden concept drift. The weights of the ensemble classifier are adjusted from two aspects. First, a time-decayed strategy decreases the weights of the dynamic classifiers to make the ensemble classifier focus more on the new concept of the data stream. Second, a novel reinforcement mechanism is proposed to increase the weights of the base classifiers that perform better on the minority class and decrease the weights of the classifiers that perform worse. A resampling buffer is used for storing the instances of the minority class to balance the imbalanced distribution over time. In our experiment, we compare the proposed method with other state-of-the-art algorithms on both real-world and synthetic data streams. The results show that the proposed method achieves the best performance in terms of both Prequential AUC and accuracy.INDEX TERMS Online ensemble learning, resample learning, reinforcement, concept drift, class imbalance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.