In this paper we examine the challenge of producing ensembles of regression models for large datasets. We generate numerous regression models by concurrently executing multiple independent instances of a genetic programming learner. Each instance may be configured with different parameters and a different subset of the training data. Several strategies for fusing predictions from multiple regression models are compared. To overcome the small memory size of each instance, we challenge our framework to learn from small subsets of training data and yet produce a prediction of competitive quality after fusion. This decreases the running time of learning which produces models of good quality in a timely fashion. Finally, we examine the quality of fused predictions over the progress of the computation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.