2015 IEEE International Congress on Big Data 2015
DOI: 10.1109/bigdatacongress.2015.12
|View full text |Cite
|
Sign up to set email alerts
|

A Parallel Distributed Weka Framework for Big Data Mining Using Spark

Abstract: Abstract-Effective Big Data Mining requires scalable and efficient solutions that are also accessible to users of all levels of expertise. Despite this, many current efforts to provide effective knowledge extraction via large-scale Big Data Mining tools focus more on performance than on use and tuning which are complex problems even for experts.Weka is a popular and comprehensive Data Mining workbench with a well-known and intuitive interface; nonetheless it supports only sequential single-node execution. Henc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0
1

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 50 publications
(18 citation statements)
references
References 17 publications
(17 reference statements)
0
17
0
1
Order By: Relevance
“…They evaluated on strong and weak scaling workloads on 5GB, 20GB and 80GB datasets and 8, 32 and 128 cores systems. Furthermore, in case of large datasets, Spark approach linearity shows the strong scaling efficiencies [18].…”
Section: Intrusion Detection System On Big Data Using Deep Learning Tmentioning
confidence: 99%
“…They evaluated on strong and weak scaling workloads on 5GB, 20GB and 80GB datasets and 8, 32 and 128 cores systems. Furthermore, in case of large datasets, Spark approach linearity shows the strong scaling efficiencies [18].…”
Section: Intrusion Detection System On Big Data Using Deep Learning Tmentioning
confidence: 99%
“…As informational indexes become bigger and progressively confused, an outrageous learning machine (ELM) that keeps running in a customary sequential condition can't understand its capacity to be quick and successful [2]. Viable Big Data Mining requires adaptable and proficient arrangements that are additionally effectively available to clients at all degrees of mastery [20]. Huge Data is "high-volume, high-speed, or potentially high assortment data resources that require new types of preparing to empower upgraded basic leadership, knowledge revelation and procedure enhancement" [1].…”
Section: Introductionmentioning
confidence: 99%
“…Utilizing worked in libraries independently, existing frameworks have figured out how to utilize Spark for such examination of Big Data [16]. There are as of now two LDA frameworks on Spark, to be specific, Spark LDA [19] and the official one in ML lib [20] (which uses variational induction as opposed to CGS). In any case, these frameworks perform and scale ineffectively, since them two have high computational expense [9].…”
Section: Introductionmentioning
confidence: 99%
“…Such a distributed and parallelized processing led to high scalability, high efficiency, fault-tolerance and load balancing of huge size data sets. Google Firstly introduced MapReduce as a programming model and Apache Hadoop from Yahoo is considered the first implementation of MapReduce that becomes the most popular platform of large scale distributed data processing [1]. Granular computing has emerged as a new rapidly growing information processing paradigm under the umbrella of Computational Intelligence.…”
Section: Introductionmentioning
confidence: 99%