Recent advancements in the field of data mining and knowledge discovery have opened up a multitudeof research opportunities related to streaming data. One major challenge in handling streaming data isbuilding efficient machine-learning models that handle the dynamics of features and concepts. Focusing more on features, feature drift and evolving features are among the most unaddressed issues. It isdifficult to deal with the evolution of features in a stream; since most machine learning algorithms arerestricted to learning a fixed number of features. In the proposed work, a machine learning frameworkfor data streams with feature evolution is introduced. This methodology utilizes a dynamic autoen-coder to translate varying features into feature spaces with fixed dimensions. For classification, we constructed an ensemble model with Logical Regression, a Decision Tree, a Support Vector Machine,and K-Nearest Neighbor(KNN), which preserves past concepts. Based on experimental results, themodel was found to be promising for a variety of datasets, including Weather data (accuracy86%), Electricity data (94%), and Forest Cover types data (95%). By effectively combining deep-learning techniques with traditional approaches, many streaming data challenges can be addressed
Clustering streaming data is challenging due to many temporal dynamics,such as concept drift, concept evolution, and feature evolution.Concept evolution is the most challenging of these. Due to concept evolution,new classes may emerge or existing classes may disappear, soit is crucial to process streaming data continuously. This paper proposesa novel online clustering method, specifically for streaming datawith concept evolution. It consists of three phases: initialization, clusteringand outlier handling. To identify recurrences of previous datain streaming data, it is critical to preserve the sequential propertiesof data chunks. In the proposed model, representatives from previouswindows are added to the current window, making it distinct from existingmodels. The detection and handling of outliers are very challengingtasks in streaming data analysis. Outliers are often the first instancesof a new cluster. The proposed model stores the outliers from eachdata window. When the number of outliers exceeds a certain threshold,the representatives of outliers are added to the next window toidentify new classes. To handle the lack of data sets for the training ofsuch models, we created a synthetic data set with 22020 data instances.Using Silhouette Coefficient, Calinski-Harabasz Index, and Davies-Bouldin Index analysis, this model yielded the most favourable results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.