In this paper, we solve the well-known symbolic regression problem that has been intensively studied and has a wide range of applications.
To solve it, we propose an efficient meta-heuristic-based approach, called RILS-ROLS. RILS-ROLS is based on the following two elements: (i) iterated local search, which is the method backbone, mainly solving combinatorial and some continuous aspects of the problem; (ii) ordinary least square method, which focuses on the continuous aspect of the search space -- it efficiently determines the best-fitting coefficients of linear combinations within solution equations. In addition, we introduce a novel fitness function that combines important model quality measures: R2 score, RMSE score, size of the model (or model complexity), and carefully designed local search, which allows systematic search in proximity to candidate solution.
Experiments are conducted on the two well-known ground-truth benchmark sets called Feynman and Strogatz. RILS-ROLS was compared to 14 other competitors from the literature. Our method outperformed all 14 competitors with respect to the true symbolic model accuracy under varying levels of noise. It also proved to be the most efficient method considering the average running time of reaching the exact model. In addition to evaluation on known ground-truth datasets, we introduce a new randomly generated set of problem instances. The goal of the Random dataset was to test the scalability of our method with respect to incremental equation sizes and number of variables, under different levels of noise.
The statistical analysis of obtained experimental results confirmed that RILS-ROLS can be considered as a new state-of-the-art method for solving the problem of symbolic regression when ground-truth equations are known.