Summary
Gene expression programming (GEP) algorithm is one of the most effective function mining algorithms in enabling the mathematical equation fitting for the input dataset. However, GEP algorithm encounters low efficiency issue in big data processing due to large overhead in its evolution when it handles the large‐scale data. In order to solve the issue, this paper presents two parallelized GEP algorithms using MapReduce. Based on data separation, the first algorithm aims at speeding up the large‐scale classification. However, it is lack of ability to output the mined equation explicitly. Therefore, based on the further improvements of the first algorithm, the second parallelized GEP algorithm aims at mining the equation efficiently and also outputs the equation explicitly and directly. The experimental results show that both algorithms are effective for processing large volume of data.