Summary
The big data era has resulted in the development of several data analysis tools. Spark is a type of in‐memory processing fitted iteration and interactive data mining tool. This tool possesses higher data‐processing performance than MapReduce, which is an offline storage mechanism. However, some disadvantages of in‐memory processing, such as massive in‐memory data requirements, cause cross‐node data transfer that result in a long computation time. The performance of the process can be improved if the in‐memory process is executed with fewer shuffle instructions. Therefore, this study aims to enhance the performance of iterative application through instruction replacement. Three empirical research cases with diverse datasets and iterations are used to modify the program. We adopt a strategy of downloading a small resilient distributed dataset and replacing the shuffle‐included instructions to shorten the processing time with an automated code replacement by using exhaustively code matching. The experimental results reveal an improvement of up to 39% in the execution time compared with the existing in‐memory processing programs with various dataset sizes.