2018
DOI: 10.1007/978-3-319-97088-2_14
|View full text |Cite
|
Sign up to set email alerts
|

Identifying and Harnessing the Building Blocks of Machine Learning Pipelines for Sensible Initialization of a Data Science Automation Tool

Abstract: As data science continues to grow in popularity, there will be an increasing need to make data science tools more scalable, flexible, and accessible. In particular, automated machine learning (AutoML) systems seek to automate the process of designing and optimizing machine learning pipelines. In this chapter, we present a genetic programming-based AutoML system called TPOT that optimizes a series of feature preprocessors and machine learning models with the goal of maximizing classification accuracy on a super… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
2
1
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 20 publications
0
3
0
Order By: Relevance
“…Beyond using GP to perform the machine learning itself, recent work has shown that GP can also be harnessed to optimize a sequence of existing data analysis and machine learning operations on a dataset to maximize the predictive performance of the final machine learning model [30,35]. For example, TPOT 4 is an early prototype that uses GP to optimize a sequence of scikit-learn operations for both classification and regression problems [25][26][27], and has been shown to work quite well across a broad range of application domains ranging from epidemiological studies to image classification to time series prediction [23]. Given the general design of TPOT, the operations it optimizes over can be specialized for particular problem domains.…”
Section: Discussion and Future Workmentioning
confidence: 99%
“…Beyond using GP to perform the machine learning itself, recent work has shown that GP can also be harnessed to optimize a sequence of existing data analysis and machine learning operations on a dataset to maximize the predictive performance of the final machine learning model [30,35]. For example, TPOT 4 is an early prototype that uses GP to optimize a sequence of scikit-learn operations for both classification and regression problems [25][26][27], and has been shown to work quite well across a broad range of application domains ranging from epidemiological studies to image classification to time series prediction [23]. Given the general design of TPOT, the operations it optimizes over can be specialized for particular problem domains.…”
Section: Discussion and Future Workmentioning
confidence: 99%
“…Users can also customize and modify the code snippets according to their needs and preferences. Aliro also includes a pre-trained machine learning recommendation system that can assist users to automate the selection of machine learning algorithms and their hyperparameters, as well as provide visualization of the evaluated model and data ( Olson and Moore 2016 ).…”
Section: Introductionmentioning
confidence: 99%
“…While this was disabled here to prevent information leak (as the meta-learning likely involved datasets used here), this could show benefit for novel datasets. This has also been explored with TPOT [124] but is not yet included in the stable release.…”
Section: Further Analysis and Recommendationsmentioning
confidence: 99%