2017
DOI: 10.18637/jss.v076.i14
|View full text |Cite
|
Sign up to set email alerts
|

Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R

Abstract: In recent years, data streams have become an increasingly important area of research for the computer science, database and statistics communities. Data streams are ordered and potentially unbounded sequences of data points created by a typically non-stationary data generating process. Common data mining tasks associated with data streams include clustering, classification and frequent pattern mining. New algorithms for these types of data are proposed regularly and it is important to evaluate them thoroughly … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
30
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 43 publications
(30 citation statements)
references
References 47 publications
0
30
0
Order By: Relevance
“…In this article, we will discuss methods of data stream clustering. There are various implementations dedicated for this purpose, such as the stream package in R [11], Massive Online Analysis software written in Java [12] or scikit-learn library in Python [13]. Unfortunately, this software provides methods for only one stream and not multiple stream processing.…”
mentioning
confidence: 99%
“…In this article, we will discuss methods of data stream clustering. There are various implementations dedicated for this purpose, such as the stream package in R [11], Massive Online Analysis software written in Java [12] or scikit-learn library in Python [13]. Unfortunately, this software provides methods for only one stream and not multiple stream processing.…”
mentioning
confidence: 99%
“…It is a java‐based software package that contains state‐of‐the‐art algorithms and evaluations measures for running experiments. Recently, an R package called Steam published that allows to perform clustering experiments, the main advantage of stream is that it seamlessly integrates with the large existing infrastructure provided by R.…”
Section: Methodsmentioning
confidence: 99%
“…It is a java-based software package that contains state-of-the-art algorithms and evaluations measures for running experiments. Recently, an R package called Steam [27] published that allows to perform clustering experiments, the main advantage of stream is that it seamlessly integrates with the large existing infrastructure provided by R. Also, in order to clear any ambiguity between the terms window size w and horizon H, window size w defines the number of data objects which arrive in each time period and horizon means the number of windows in which the clustering analysis is evaluated; we have set for the entire evaluation analysis that horizon. H = 1.…”
Section: More Experimental Settingsmentioning
confidence: 99%
“…The extended D-Stream algorithm increases the cluster feature but can also have the reverse effect. Introduced a research tool that includes modelling and simulating data streams [18] extensible framework used for implementing, interfacing and experimenting with algorithms for several data stream mining tasks. Dataset Used-KDD Cup'99 dataset used.…”
Section: Literature Reviewmentioning
confidence: 99%