Analysis of Schema Frequencies in Genetic Programming

Burlacu, Bogdan; Affenzeller, Michael; Kommenda, Michael; Kronberger, Gabriel; Winkler, Stephan

doi:10.1007/978-3-319-74718-7_52

Cited by 2 publications

(2 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, Burlacu et al (2015Burlacu et al ( , 2018aBurlacu et al ( , 2018b) evaluated the BB hypothesis for GP empirically. They performed schema analyses on GP populations and identified schemata with an above-average quality as well as an increasing frequency in the populations over multiple generations.…”

Section: Related Workmentioning

confidence: 99%

“…2 Goldberg et al (2001) and Reeves (1993) argue that an initial supply of BBs is necessary for the search to allow for the possibility that high-quality BBs will take over the population in later generations (BB growth). Recent work of Burlacu et al (2015Burlacu et al ( , 2018aBurlacu et al ( , 2018b evaluates this hypothesis for GP. The authors show that building blocks have a large influence on the evolutionary process.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

On sampling error in genetic programming

2021

View full text Add to dashboard Cite

The initial population in genetic programming (GP) should form a representative sample of all possible solutions (the search space). While large populations accurately approximate the distribution of possible solutions, small populations tend to incorporate a sampling error. This paper analyzes how the size of a GP population affects the sampling error and contributes to answering the question of how to size initial GP populations. First, we present a probabilistic model of the expected number of subtrees for GP populations initialized with full, grow, or ramped half-and-half. Second, based on our frequency model, we present a model that estimates the sampling error for a given GP population size. We validate our models empirically and show that, compared to smaller population sizes, our recommended population sizes largely reduce the sampling error of measured fitness values. Increasing the population sizes even more, however, does not considerably reduce the sampling error of fitness values. Last, we recommend population sizes for some widely used benchmark problem instances that result in a low sampling error. A low sampling error at initialization is necessary (but not sufficient) for a reliable search since lowering the sampling error means that the overall random variations in a random sample are reduced. Our results indicate that sampling error is a severe problem for GP, making large initial population sizes necessary to obtain a low sampling error. Our model allows practitioners of GP to determine a minimum initial population size so that the sampling error is lower than a threshold, given a confidence level.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%