2018
DOI: 10.11591/ijece.v8i3.pp1854-1862
|View full text |Cite
|
Sign up to set email alerts
|

A Survey of Machine Learning Techniques for Self-tuning Hadoop Performance

Abstract: The Apache Hadoop framework is an open source implementation of MapReduce for processing and storing big data. However, to get the best performance from this is a big challenge because of its large number configuration parameters. In this paper, the concept of critical issues of Hadoop system, big data and machine learning have been highlighted and an analysis of some machine learning techniques applied so far, for improving the Hadoop performance is presented. Then, a promising machine learning technique usin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
8
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 14 publications
0
8
0
Order By: Relevance
“…The genetic algorithm [14] is a popular algorithm in parameter optimization. MSET [9] is a typical parameter optimization algorithm based on the genetic algorithm.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The genetic algorithm [14] is a popular algorithm in parameter optimization. MSET [9] is a typical parameter optimization algorithm based on the genetic algorithm.…”
Section: Related Workmentioning
confidence: 99%
“…At present, the methods of configuration parameter optimization for MapReduce mainly include the combination of configuration parameters, and parameter optimization methods based on simulators, experience principles, and machine learning [10], [11], [14], [16]. In the process of parameter optimization, these methods take all parameter into account.…”
Section: Introductionmentioning
confidence: 99%
“…The traditional computing system cannot offer the necessary efficiency and performance. Therefore, the big data industries have seen various platforms such ad Spark [4], Haddoo [5,6] and Strom [7] to entertain the demands of a large amount of big data processing. Apache spark is one of the most widespread frameworks among the prevailing distributes framework, due to its great capability to sustenance heavy applications and for complex data processing performance [2,4].…”
Section: Introductionmentioning
confidence: 99%
“…Multinode cluster‐based Hadoop framework structure distributed locally or in remote locations works efficiently to the storage and processing of the big data (ie, on the client‐server architecture). Figure shows the basic block diagram of Hadoop ecosystem representing 2 remote locations where the data nodes are situated and a third remote location with name node, for controlling the Hadoop multinode cluster …”
Section: Introductionmentioning
confidence: 99%