Model Selection and Overfitting in Genetic Programming

Žegklitz, Jan; Pošík, Petr

doi:10.1145/2739482.2764678

Cited by 8 publications

(9 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Due to this probable premature undesired stop and the unknown number of the optimum number of generations, this work has performed the training processes during a number of generations which is typically considered as high in the related literature (1000). When the generations had elapsed, instead of returning the model generated in the last generation, the returned model is the validation-best individual, which obtained the lowest error on the validation dataset, similarly to the approach proposed in other studies, called Backwarding (Žegklitz & Pošík, 2015). In this work, the error measurement was performed using the Root Mean Squared Error (RMSE) between the output and predicted values for regression problems, and the misclassification rate for classification problems, as described in Section 4.…”

Section: Methodsmentioning

confidence: 99%

“…To prevent overfitting, the use of a validation dataset was also deeply studied (Gagné et al, 2006;Žegklitz & Pošík, 2015). For example, Canary Functions were introduced in a work in which a validation dataset is used to measure overfitting.…”

Section: Related Workmentioning

confidence: 99%

“…Other studies using a validation dataset do not stop the evolutionary process, although the returned individual is the one which obtained the best fitness value in the validation dataset. This process is known as Backwarding (Robilliard & Fonlupt, 2001;Žegklitz & Pošík, 2015).…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Population subset selection for the use of a validation dataset for overfitting control in genetic programming

Rivero

Fernández-Blanco

Fernández-Lozano

2019

Journal of Experimental & Theoretical Artificial Intelligen

View full text Add to dashboard Cite

Genetic Programming (GP) is a technique which is able to solve different problems through the evolution of mathematical expressions. However, in order to be applied, its tendency to overfit the data is one of its main issues. The use of a validation dataset is a common alternative to prevent overfitting in many Machine Learning (ML) techniques, including GP. But, there is one key point which differentiates GP and other ML techniques: instead of training a single model, GP evolves a population of models. Therefore, the use of the validation dataset has several possibilities because any of those evolved models could be evaluated. This work explores the possibility of using the validation dataset not only on the training-best individual but also in a subset with the training-best individuals of the population. The study has been conducted with 5 wellknown databases performing regression or classification tasks. In most of the cases, the results of the study point out to an improvement when the validation dataset is used on a subset of the population instead of only on the training-best individual, which also induces a reduction on the number of nodes and, consequently, a lower complexity on the expressions.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Population subset selection for the use of a validation dataset for overfitting control in genetic programming

Rivero

Fernández-Blanco

Fernández-Lozano

2019

Journal of Experimental & Theoretical Artificial Intelligen

View full text Add to dashboard Cite

show abstract

“…Genetik algoritmalar yöntemi esasıyla analiz edilen ve yordanan değişkenin okuma becerileri; yordayıcı değişkenlerin ise cinsiyet, anne ve baba eğitim durumu, evde internet kullanımı, evde konuşulan dil, sahip olunan e-kitap okuyucu sayısı, evdeki kitap türü ve sayısı, okuma becerilerini ölçen maddeleri yanıtlama hızları, okulda okuma becerileri için ayrılan haftalık ders saati ve sınıf düzeyi olan regresyon modelinin değişkenlerine ilişkin betimsel özellikler Tablo 4a ve Tablo 4b'de sunulmuştur. Genetik algoritmalar yöntemi gibi makine öğrenmelerine dayalı yöntemlerde kestirimlere yönelik yanlı hatalar üretilmesi problemi olarak açıklanan aşırı uyum sorunlarının önüne geçmek için veri seti eğitim (training) ve test (testing) seti olmak üzere iki alt gruba ayrılır (Kushchu, 2002;Žegklitz ve Pošík, 2015). Bu nedenle, verilerin %70'i eğitim seti (n=3663) ve %30'u test seti (n=1569) olmak üzere (Ahmed ve Elaraby, 2014;Pahmi, Saepudin, Maesarah, Solehudin ve Wulandari, 2018) ikiye ayrılarak analiz işlemleri gerçekleştirilmiştir.…”

Section: Bulgularunclassified

“…Regresyon analizlerinin; yordayıcı değişkenlerin seçilmesi, regresyon modelini tanımlayan fonksiyonun belirlenmesi ve modeldeki parametrelerin kestirilmesi olmak üzere üç temel kullanımı söz konusudur (Paterlini ve Minerva, 2010;Yang, Chuang, Jeng ve Tao, 2011). Genetik algoritmalar, regresyon modellerinde en iyi modelin oluşturulması ya da model için en iyi yordayıcı değişkenlerin seçilmesinde kullanılan bir tekniktir (Vasant, 2013;Žegklitz ve Pošík, 2015). Genetik algoritmalar, sezgisel arama yaklaşımına dayanmakta olup canlıların biyolojik gelişiminden esinlenerek, doğal seçim mekaniği ve doğal genetiğe bağlı arama yapan olasılık temelli bir yöntemdir.…”

Section: Introductionunclassified

Okuma Becerilerini Yordayan Özelliklerin Belirlenmesi: Genetik Algoritma Kestirimi

Aydoğan

Gelbal

2022

Ana Dili Eğitimi Dergisi

View full text Add to dashboard Cite

Bu araştırmayla öğrencilerin okuma becerilerini yordayan özelliklerin belirlenmesi amaçlanmıştır. Araştırmanın çalışma grubunu, PISA 2015 uygulamasına katılan 42 farklı ülkeden 5232 on beş yaş grubu öğrenci oluşturmuştur. Araştırma verileri, PISA 2015 programı verileri üzerinden sağlanmış olup, genetik algoritmalar yöntemi kestirimine dayalı regresyon modeli esasıyla analiz edilmiştir. Genetik algoritmalar yöntemi ile okuma becerilerini en iyi derecede yordayan değişkenlerden oluşan regresyon modeli için değişken seçim işlemi yapmak istenmiştir. Elde edilen sonuçlara göre, cinsiyet, baba eğitim durumu, evde internet kullanımı, evde konuşulan dil, sahip olunan ekitap okuyucu sayısı, okuma becerisini ölçen maddeleri yanıtlama hızı ve evdeki kitap çeşitliliği ve sayısı değişkenlerinin öğrencilerin okuma becerilerini istatistiksel olarak anlamlı düzeyde yordadığı saptanmıştır. Yordama düzeyi anlamlı bulunan değişkenlerdeki farklılaşmanın öğrencilerin okuma becerilerinde de anlamlı düzeyde farklılaşmaya yol açtığı anlaşılmıştır.

show abstract

Difficult first strategy GP: an inexpensive sampling technique to improve the performance of genetic programming

Ali

Majeed

2020

Evol. Intel.

View full text Add to dashboard Cite

Model Selection and Overfitting in Genetic Programming

Cited by 8 publications

References 12 publications

Population subset selection for the use of a validation dataset for overfitting control in genetic programming

Population subset selection for the use of a validation dataset for overfitting control in genetic programming

Okuma Becerilerini Yordayan Özelliklerin Belirlenmesi: Genetik Algoritma Kestirimi

Difficult first strategy GP: an inexpensive sampling technique to improve the performance of genetic programming

Contact Info

Product

Resources

About