2017
DOI: 10.1002/cpe.4379
|View full text |Cite
|
Sign up to set email alerts
|

MapReduce‐based parallel GEP algorithm for efficient function mining in big data applications

Abstract: Summary Gene expression programming (GEP) algorithm is one of the most effective function mining algorithms in enabling the mathematical equation fitting for the input dataset. However, GEP algorithm encounters low efficiency issue in big data processing due to large overhead in its evolution when it handles the large‐scale data. In order to solve the issue, this paper presents two parallelized GEP algorithms using MapReduce. Based on data separation, the first algorithm aims at speeding up the large‐scale cla… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 21 publications
0
6
0
Order By: Relevance
“…The differences between Spark and Hadoop in intermediate data buffer result in high performance of iterative applications and interactive data mining with Spark. 17,25 Dharanipragada et al proposed Generate-Map-Reduce (GMR), which was an extension to MapReduce, to support iterative jobs and a distributed communication model by using shared data structures. GMR captured recursive computations by modeling iterative applications, such as simulated annealing and A* search.…”
Section: 2mentioning
confidence: 99%
See 1 more Smart Citation
“…The differences between Spark and Hadoop in intermediate data buffer result in high performance of iterative applications and interactive data mining with Spark. 17,25 Dharanipragada et al proposed Generate-Map-Reduce (GMR), which was an extension to MapReduce, to support iterative jobs and a distributed communication model by using shared data structures. GMR captured recursive computations by modeling iterative applications, such as simulated annealing and A* search.…”
Section: 2mentioning
confidence: 99%
“…Li and Shen evaluated the handling platform between local and remote file systems for a given application. 4,25 Samadi et al compared the performance according to the criteria execution time, throughput, and speedup. 6 They had evaluated the performance observed by Spark is higher than Hadoop.…”
Section: Related Work Comparisonsmentioning
confidence: 99%
“…However, the authors still report that low efficiency issue occurs when the algorithms are dealing with the large-volume load data due to the algorithm overhead. As a result, Liu et al (2016), Liu et al (2017), and finally introduce the distributed computing to improve the efficiency of the large-scale load data classification. The authors report that because of the difficulties in the algorithm decoupling, the ensemble learning technology is a necessary tool to implement algorithm parallelization.…”
Section: Introductionmentioning
confidence: 99%
“…There have been many attempts to improve its performance, especially for data characterized by massive amount. For example Liu et al 2 considered the parallelizing GEP algorithm to enable large-scale classi¯cation, using majority-voting to combine a number of GEP-based classi¯ers obtained for separate data chunks.…”
Section: Introductionmentioning
confidence: 99%