Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation 2015
DOI: 10.1145/2739482.2764678
|View full text |Cite
|
Sign up to set email alerts
|

Model Selection and Overfitting in Genetic Programming

Abstract: Genetic Programming has been very successful in solving a large area of problems but its use as a machine learning algorithm has been limited so far. One of the reasons is the problem of overfitting which cannot be solved or suppresed as easily as in more traditional approaches. Another problem, closely related to overfitting, is the selection of the final model from the population.In this article we present our research that addresses both problems: overfitting and model selection. We compare several ways of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0
4

Year Published

2015
2015
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(9 citation statements)
references
References 12 publications
0
5
0
4
Order By: Relevance
“…Due to this probable premature undesired stop and the unknown number of the optimum number of generations, this work has performed the training processes during a number of generations which is typically considered as high in the related literature (1000). When the generations had elapsed, instead of returning the model generated in the last generation, the returned model is the validation-best individual, which obtained the lowest error on the validation dataset, similarly to the approach proposed in other studies, called Backwarding (Žegklitz & Pošík, 2015). In this work, the error measurement was performed using the Root Mean Squared Error (RMSE) between the output and predicted values for regression problems, and the misclassification rate for classification problems, as described in Section 4.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Due to this probable premature undesired stop and the unknown number of the optimum number of generations, this work has performed the training processes during a number of generations which is typically considered as high in the related literature (1000). When the generations had elapsed, instead of returning the model generated in the last generation, the returned model is the validation-best individual, which obtained the lowest error on the validation dataset, similarly to the approach proposed in other studies, called Backwarding (Žegklitz & Pošík, 2015). In this work, the error measurement was performed using the Root Mean Squared Error (RMSE) between the output and predicted values for regression problems, and the misclassification rate for classification problems, as described in Section 4.…”
Section: Methodsmentioning
confidence: 99%
“…To prevent overfitting, the use of a validation dataset was also deeply studied (Gagné et al, 2006;Žegklitz & Pošík, 2015). For example, Canary Functions were introduced in a work in which a validation dataset is used to measure overfitting.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Genetik algoritmalar yöntemi esasıyla analiz edilen ve yordanan değişkenin okuma becerileri; yordayıcı değişkenlerin ise cinsiyet, anne ve baba eğitim durumu, evde internet kullanımı, evde konuşulan dil, sahip olunan e-kitap okuyucu sayısı, evdeki kitap türü ve sayısı, okuma becerilerini ölçen maddeleri yanıtlama hızları, okulda okuma becerileri için ayrılan haftalık ders saati ve sınıf düzeyi olan regresyon modelinin değişkenlerine ilişkin betimsel özellikler Tablo 4a ve Tablo 4b'de sunulmuştur. Genetik algoritmalar yöntemi gibi makine öğrenmelerine dayalı yöntemlerde kestirimlere yönelik yanlı hatalar üretilmesi problemi olarak açıklanan aşırı uyum sorunlarının önüne geçmek için veri seti eğitim (training) ve test (testing) seti olmak üzere iki alt gruba ayrılır (Kushchu, 2002;Žegklitz ve Pošík, 2015). Bu nedenle, verilerin %70'i eğitim seti (n=3663) ve %30'u test seti (n=1569) olmak üzere (Ahmed ve Elaraby, 2014;Pahmi, Saepudin, Maesarah, Solehudin ve Wulandari, 2018) ikiye ayrılarak analiz işlemleri gerçekleştirilmiştir.…”
Section: Bulgularunclassified
“…Regresyon analizlerinin; yordayıcı değişkenlerin seçilmesi, regresyon modelini tanımlayan fonksiyonun belirlenmesi ve modeldeki parametrelerin kestirilmesi olmak üzere üç temel kullanımı söz konusudur (Paterlini ve Minerva, 2010;Yang, Chuang, Jeng ve Tao, 2011). Genetik algoritmalar, regresyon modellerinde en iyi modelin oluşturulması ya da model için en iyi yordayıcı değişkenlerin seçilmesinde kullanılan bir tekniktir (Vasant, 2013;Žegklitz ve Pošík, 2015). Genetik algoritmalar, sezgisel arama yaklaşımına dayanmakta olup canlıların biyolojik gelişiminden esinlenerek, doğal seçim mekaniği ve doğal genetiğe bağlı arama yapan olasılık temelli bir yöntemdir.…”
Section: Introductionunclassified