Learning regression ensembles with genetic programming at scale

Veeramachaneni, Kalyan; Derby, Owen; Sherry, Dylan; O’Reilly, Una-May

doi:10.1145/2463372.2463506

Cited by 17 publications

(12 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another important issue is the comparison against ensembles of trees [14]. Ensembles usually rely on the model performance, therefore, the best models are selected to generate a fused prediction.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Kaizen programming

Melo¹

2014

Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation

View full text Add to dashboard Cite

This paper presents Kaizen Programming, an evolutionary tool based on the concepts of Continuous Improvement from Kaizen Japanese methodology. One may see Kaizen Programming as a new paradigm since, as opposed to classical evolutionary algorithms where individuals are complete solutions, in Kaizen Programming each expert proposes an idea to solve part of the problem, thus a solution is composed of all ideas together. Consequently, evolution becomes a collaborative approach instead of an egocentric one. An idea's quality (analog to an individual's fitness) is not how good it fits the data, but a measurement of its contribution to the solution, which improves the knowledge about the problem. Differently from evolutionary algorithms that simply perform trial-and-error search, one can determine, exactly, parts of the solution that should be removed or improved. That property results in the reduction in bloat, number of function evaluations, and computing time. Even more important, the Kaizen Programming tool, proposed to solve symbolic regression problems, builds the solutions as linear regression models -not linear in the variables, but linear in the parameters, thus all properties and characteristics of such statistical tool are valid. Experiments on benchmark functions proposed in the literature show that Kaizen Programming easily outperforms Genetic Programming and other methods, providing high quality solutions for both training and testing sets while requiring a small number of function evaluations.

show abstract

Section: Discussionmentioning

confidence: 99%

“…The present work may be more related to ensembles of trees [14], a method in which several models (trees) are independently evolved, and then combined to provide a better forecast. The final fused prediction may be the average of the individual predictions, a weighted average, or another statistic.…”

Section: Related Workmentioning

confidence: 99%

Kaizen programming

Melo¹

2014

Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation

View full text Add to dashboard Cite

show abstract

“…Experiments showed the validity of the approach when compared to standard techniques for the task at hand. Other applications of ensemble methods to GP includes the use of querying-by-committee methods [26,2] and of a divide-andconquer strategy, in which ax solution need to work well only on a subset of the entire training set [31,1] With respect to ensembles of regression models, a quite recent contribution was proposed in [38]. The idea explored by the authors was to generate several regression models by concurrently executing multiple independent instances of a GP and, subsequently to analyze several strategies for fusing predictions from the multiple regression models.…”

Section: Related Workmentioning

confidence: 99%

“…The study considered only small datasets due to memory constraints, but authors were able to draw interesting conclusions about the suitability of their approach in producing accurate predictions. Our study will differ from the one described in [38] in several ways: we do not put any constraint on the size of the datasets, we will consider models produced by different GP algorithms (blend of STGP and GSGP) and we define and use different similarity-based criteria that, by taking into account the information related to all the populations evolved, aim at improving the generalization ability of the final ensemble as well as reducing the computational effort. Hence, in the experiments described in this contribution and as explained in Section 3, the populations evolved are not independent of each other.…”

Section: Related Workmentioning

confidence: 99%

Pruning Techniques for Mixed Ensembles of Genetic Programming Models

Castelli

Gonçalves

Manzoni

et al. 2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

The objective of this paper is to define an effective strategy for building an ensemble of Genetic Programming (GP) models. Ensemble methods are widely used in machine learning due to their features: they average out biases, they reduce the variance and they usually generalize better than single models. Despite these advantages, building ensemble of GP models is not a well-developed topic in the evolutionary computation community. To fill this gap, we propose a strategy that blends individuals produced by standard syntax-based GP and individuals produced by geometric semantic genetic programming, one of the newest semantics-based method developed in GP. In fact, recent literature showed that combining syntax and semantics could improve the generalization ability of a GP model. Additionally, to improve the diversity of the GP models used to build up the ensemble, we propose different pruning criteria that are based on correlation and entropy, a commonly used measure in information theory. Experimental results, obtained over different complex problems, suggest that the pruning criteria based on correlation and entropy could be effective in improving the generalization ability of the ensemble model and in reducing the computational burden required to build it.

show abstract

“…A windowing method that divides the training dataset into non-overlapping strata preserving the same class distribution, named ILAS, was presented in [11]. Note that other types of techniques such as subgroup discovery [20], frequent patterns mining in data streams [24] or regression models [62] have been also tackled by windowing mechanisms.…”

Section: Related Workmentioning

confidence: 99%

Enhancing the scalability of a genetic algorithm to discover quantitative association rules in large-scale datasets

Martínez-Ballesteros

Bacardit

Troncoso

et al. 2015

ICA

View full text Add to dashboard Cite

Association rule mining is a well-known methodology to discover significant and apparently hidden relations among attributes in a subspace of instances from datasets. Genetic algorithms have been extensively used to find interesting association rules. However, the rule-matching task of such techniques usually requires high computational and memory requirements. The use of efficient computational techniques has become a task of the utmost importance due to the high volume of generated data nowadays. Hence, this paper aims at improving the scalability of quantitative association rule mining techniques based on genetic algorithms to handle large-scale datasets without quality loss in the results obtained. For this purpose, a new representation of the individuals, new genetic operators and a windowing-based learning scheme are proposed to achieve successfully such challenging task. Specifically, the proposed techniques are integrated into the multi-objective evolutionary algorithm named QARGA-M to assess their performances. Both the standard version and the enhanced one of QARGA-M have been tested in several datasets that present different number of attributes and instances. Furthermore, the proposed methodologies have been integrated into other existing techniques based in genetic algorithms to discover quantitative association rules. The comparative analysis performed shows significant improvements of QARGA-M and other existing genetic algorithms in terms of computational costs without losing quality in the results when the proposed techniques are applied.

show abstract

Learning regression ensembles with genetic programming at scale

Cited by 17 publications

References 23 publications

Kaizen programming

Kaizen programming

Pruning Techniques for Mixed Ensembles of Genetic Programming Models

Enhancing the scalability of a genetic algorithm to discover quantitative association rules in large-scale datasets

Contact Info

Product

Resources

About