2007
DOI: 10.1109/tsmcb.2007.896406
|View full text |Cite
|
Sign up to set email alerts
|

Scaling Genetic Programming to Large Datasets Using Hierarchical Dynamic Subset Selection

Abstract: The computational overhead of genetic programming (GP) may be directly addressed without recourse to hardware solutions using active learning algorithms based on the random or dynamic subset selection heuristics (RSS or DSS). This correspondence begins by presenting a family of hierarchical DSS algorithms: RSS-DSS, cascaded RSS-DSS, and the balanced block DSS algorithm, where the latter has not been previously introduced. Extensive benchmarking over four unbalanced real-world binary classification problems wit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0

Year Published

2008
2008
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 43 publications
(29 citation statements)
references
References 12 publications
0
29
0
Order By: Relevance
“…Therefore, the convergence of the algorithm is faster when GA is employed. Subset generation [48] is a method of heuristic search, in which each instance in the search space specifies a candidate solution for subset evaluation. The decision process of this method is determined by some basic issues.…”
Section: Genetic-based Feature Selectionmentioning
confidence: 99%
“…Therefore, the convergence of the algorithm is faster when GA is employed. Subset generation [48] is a method of heuristic search, in which each instance in the search space specifies a candidate solution for subset evaluation. The decision process of this method is determined by some basic issues.…”
Section: Genetic-based Feature Selectionmentioning
confidence: 99%
“…This problem is related to both the number and size of the training data. In the case of machine learning, research on intelligent sampling methods [34,67] might be necessary to enable the use of the methods. This problem has been already explored in other contexts, such as selecting a subset of instances for the user to label or reducing data sets for evolutionary algorithms fitness computation, and some such wellunderstood methods could certainly be applied.…”
Section: Challenges Relating To Datasets and Trainingmentioning
confidence: 99%
“…To make sure that all instances are eventually seen by the system, selection based on age is occasionally performed. In order to extend this paradigm to datasets that cannot be entirely cached in memory, hierarchical versions of DSS have also been proposed [30,31]. These algorithms first partition the dataset into blocks that fit entirely in memory then limit subset selection to blocks that have already been cached in order to take advantage of faster read speeds.…”
Section: Genetic Programming On Large Classification Problemsmentioning
confidence: 99%
“…These algorithms first partition the dataset into blocks that fit entirely in memory then limit subset selection to blocks that have already been cached in order to take advantage of faster read speeds. The balanced block version of the algorithm [30] further refines selection to handle highly unbalanced class distributions.…”
Section: Genetic Programming On Large Classification Problemsmentioning
confidence: 99%