A smart method for spark using neural network for big data

Rahman, Md. Armanur; Hossen, J.; Sultana, Aziza; Mamun, Abdullah Al; Aziz, Nor Azlina Ab.

doi:10.11591/ijece.v11i3.pp2525-2534

Cited by 3 publications

(2 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It has the advantage of having an automatic data balancing and a distributed design, making it a good choice for big data analysis. A collection of dominating people in occupations for numerous machine learning tasks, such as classification, regression, base compilation and extraction (and dimensional reduction), is introduced by Apache Spark [47]- [51]. Despite the fact that numerous research have been conducted on machine learning and its usefulness, ML libraries for big data analysis, such as Apache Spark MLlib, have received little attention.…”

Section: Introductionmentioning

confidence: 99%

“…Lunga et al [85] proposed a framework that makes use of Spark's distributed computing capabilities as well as deep learning architecture for multiple layers perceptron (MLP) using cascade learning to train multiple layers perceptrons is proposed. A framework for in-depth training learning models with Apache Spark has been created and developed in [47], [48], [50], [51], [57], [69], [86]- [92]. This framework shortens the training time by taking advantage of the advantages of both data and parity modeling at the same time.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Large scale data analysis using MLlib

et al. 2021

View full text Add to dashboard Cite

Recent advancements in the internet, social media, and internet of things (IoT) devices have significantly increased the amount of data generated in a variety of formats. The data must be converted into formats that is easily handled by the data analysis techniques. It is mathematically and physically expensive to apply machine learning algorithms to big and complicated data sets. It is a resource-intensive process that necessitates a huge amount of logical and physical resources. Machine learning is a sophisticated data analytics technology that has gained in importance as a result of the massive amount of data generated daily that needs to be examined. Apache Spark machine learning library (MLlib) is one of the big data analysis platforms that provides a variety of outstanding functions for various machine learning tasks, spanning from classification to regression and dimension reduction. From a computational standpoint, this research investigated Apache Spark MLlib 2.0 as an open source, autonomous, scalable, and distributed learning library. Several real-world machine learning experiments are carried out in order to evaluate the properties of the platform on a qualitative and quantitative level. Some of the fundamental concepts and approaches for developing a scalable data model in a distributed environment are also discussed.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Large scale data analysis using MLlib

et al. 2021

View full text Add to dashboard Cite

show abstract

Performance Evaluation of Machine Learning Models on Apache Spark: An Empirical Study

Yamani

Alsunaidi

Boudellioua

2022

2022 14th International Conference on Computational Intelligence and Communication Networks (CICN)

View full text Add to dashboard Cite

A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization

Huang

Zhang

Zhai

2022

Sensors

View full text Add to dashboard Cite

Apache Spark is a popular open-source distributed data processing framework that can efficiently process massive amounts of data. It provides more than 180 configuration parameters for users to manually select the appropriate parameter values according to their own experience. However, due to the large number of parameters and the inherent correlation between them, manual tuning is very tedious. To solve the problem of tuning through personal experience, we designed and implemented a reinforcement-learning-based Spark configuration parameter optimizer. First, we trained a Spark application performance prediction model with deep neural networks, and verified the accuracy and effectiveness of the model from multiple perspectives. Second, in order to improve the search efficiency of better configuration parameters, we improved the Q-learning algorithm, and automatically set start and end states in each iteration of training, which effectively improves the agent’s poor performance in exploring better configuration parameters. Lastly, comparing our proposed configuration with the default configuration as the baseline, experimental results show that the optimized configuration gained an average performance improvement of 47%, 43%, 31%, and 45% for four different types of Spark applications, which indicates that our Spark configuration parameter optimizer could efficiently find the better configuration parameters and improve the performance of various Spark applications.

show abstract

A smart method for spark using neural network for big data

Cited by 3 publications

References 24 publications

Large scale data analysis using MLlib

Large scale data analysis using MLlib

Performance Evaluation of Machine Learning Models on Apache Spark: An Empirical Study

A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization

Contact Info

Product

Resources

About