2013
DOI: 10.1007/978-3-642-37207-0_7
|View full text |Cite
|
Sign up to set email alerts
|

Balancing Learning and Overfitting in Genetic Programming with Interleaved Sampling of Training Data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
42
1

Year Published

2014
2014
2020
2020

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 60 publications
(43 citation statements)
references
References 13 publications
0
42
1
Order By: Relevance
“…This dataset is composed by 234 instances where each is a vector of of 627 elements (626 features that identify a drug, followed by the known LD50 value for that drug). For a more detailed description of the datasets please refer to Archetti et al [1] and Gonçalves and Silva [11]. The experiments described in this section correspond to a typical machine learning task and the datasets are divided in two equal parts.…”
Section: Predictive Modelingmentioning
confidence: 99%
See 1 more Smart Citation
“…This dataset is composed by 234 instances where each is a vector of of 627 elements (626 features that identify a drug, followed by the known LD50 value for that drug). For a more detailed description of the datasets please refer to Archetti et al [1] and Gonçalves and Silva [11]. The experiments described in this section correspond to a typical machine learning task and the datasets are divided in two equal parts.…”
Section: Predictive Modelingmentioning
confidence: 99%
“…Also, the solutions obtained in the last training stages are not clearly better than those discovered in the beginning, suggesting that the GE variants are not able to evolve reliable and general models. The LD50 dataset is considered as particularly challenging and standard GP-based approaches are unable to learn Genet Program Evolvable Mach models with good generalization ability [11]. The solution to overcome difficulties is to rely on specific training strategies that help to control overfitting.…”
Section: Predictive Modelingmentioning
confidence: 99%
“…Selecting too many centroids can rob the validation and testing data sets of poorly represented data (e.g., large T mean , U sig , d 0 , d 50 ) and may tend to cause the GP to produce overly complex predictors (e.g., Gonçalves and Silva, 2013;Jensen, 1997, 1998). The selection of too few centroids can leave the testing data with too few data points to capture the variability in the data set (Goldstein et al, 2013).…”
Section: Selection Of Training Validation and Testing Data Setsmentioning
confidence: 99%
“…e interest in studying generalization and over ing in GP has been recently increasing [1,3,4,[6][7][8][9]. Geometric Semantic Genetic Programming (GSGP) [13] has also contributed to this rising interest by de ning a set of variation operators that have been shown to perform more e ectively than the corresponding Standard GP operators [4,13].…”
Section: Introductionmentioning
confidence: 99%