In this paper we examine the challenge of producing ensembles of regression models for large datasets. We generate numerous regression models by concurrently executing multiple independent instances of a genetic programming learner. Each instance may be configured with different parameters and a different subset of the training data. Several strategies for fusing predictions from multiple regression models are compared. To overcome the small memory size of each instance, we challenge our framework to learn from small subsets of training data and yet produce a prediction of competitive quality after fusion. This decreases the running time of learning which produces models of good quality in a timely fashion. Finally, we examine the quality of fused predictions over the progress of the computation.
We describe FlexGP, the first Genetic Programming system to perform symbolic regression on large-scale datasets on the cloud via massive data-parallel ensemble learning. FlexGP provides a decentralized, fault tolerant parallelization framework that runs many copies of Multiple Regression Genetic Programming, a sophisticated symbolic regression algorithm, on the cloud. Each copy executes with a different sample of the data and different parameters. The framework can create a fused model or ensemble on demand as the individual GP learners are evolving. We demonstrate our framework by deploying 100 independent GP instances in a massive data-parallel manner to learn from a dataset composed of 515K exemplars and 90 features, and by generating a competitive fused model in less than 10 minutes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.