2015
DOI: 10.1016/j.procs.2015.07.290
|View full text |Cite
|
Sign up to set email alerts
|

Micro-Batching Growing Neural Gas for Clustering Data Streams Using Spark Streaming

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 18 publications
(9 citation statements)
references
References 8 publications
0
9
0
Order By: Relevance
“…Among them, Spark [20] highlights as one of the most flexible and powerful engines to performed faster distributed computing in big data by using in-memory primitives. This platform allows user programs to load data into memory and query it repeatedly, making it more suitable for online, iterative or data streams algorithms [21].…”
Section: A C C E P T E D Mmentioning
confidence: 99%
“…Among them, Spark [20] highlights as one of the most flexible and powerful engines to performed faster distributed computing in big data by using in-memory primitives. This platform allows user programs to load data into memory and query it repeatedly, making it more suitable for online, iterative or data streams algorithms [21].…”
Section: A C C E P T E D Mmentioning
confidence: 99%
“…Moreover, it cannot deal with categorical data with the conventional principal component analysis. Although other distributed clustering algorithms for heterogeneous datasets are proposed, e.g., OPTICS algorithm [21] and the SDBDC algorithm [17], these methods assume clusters of similar density, and may have problems separating nearby clusters [22] and the appropriate choice of parameters, such as the radius parameter, which is still an open issue [23]. In sum, none of the existing algorithms adequately address the problems we have outlined here.…”
Section: Distributed Clustering Algorithmsmentioning
confidence: 99%
“…However, the design of a "distributed" version of G-Stream would raise difficulties, which are resolved by MBG-Stream [35]. This later operates with parameters to control the decay (or "forgetfulness") of the estimates.…”
Section: G-streammentioning
confidence: 99%
“…In the streaming clustering point of view, Spartakus 2 is an open-source project on top of Spark-notebook 3 which provides front-end packages for some clustering algorithms implemented using the MapReduce framework. This includes the MBG-Stream 4 algorithm [35] (detailed in "Background" section) with an integrated interface for execution and visualization checks. MLlib [64] gives implementations of some clustering algorithms, especially a Streaming k-means 5 open-source code.…”
Section: Spark Streamingmentioning
confidence: 99%