2022
DOI: 10.48550/arxiv.2205.01311
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Automatically Debugging AutoML Pipelines using Maro: ML Automated Remediation Oracle (Extended Version)

Julian Dolby,
Jason Tsay,
Martin Hirzel

Abstract: Machine learning in practice often involves complex pipelines for data cleansing, feature engineering, preprocessing, and prediction. These pipelines are composed of operators, which have to be correctly connected and whose hyperparameters must be correctly configured. Unfortunately, it is quite common for certain combinations of datasets, operators, or hyperparameters to cause failures. Diagnosing and fixing those failures is tedious and error-prone and can seriously derail a data scientist's workflow. This p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 17 publications
0
3
0
Order By: Relevance
“…The result is a remediated pipeline (8), which the data scientist can inspect directly if they so wish (9). Alternatively, to make the fix easier to understand, the data scientist can send the remediated pipeline and the original pipeline to Maro's explainer component (10). This explains the remediation to the data scientist by rendering it in natural language (11).…”
Section: Tool Overviewmentioning
confidence: 99%
See 2 more Smart Citations
“…The result is a remediated pipeline (8), which the data scientist can inspect directly if they so wish (9). Alternatively, to make the fix easier to understand, the data scientist can send the remediated pipeline and the original pipeline to Maro's explainer component (10). This explains the remediation to the data scientist by rendering it in natural language (11).…”
Section: Tool Overviewmentioning
confidence: 99%
“…All planned pipelines use common ML operators, mostly from scikitlearn [26], such as LogisticRegression, or OneHotEncoder, but also operators from other scikit-learn compatible libraries, such as a bias mitigator from AIF360 [7] and gradient-boosted trees from LightGBM [21]. The full list of pipelines is available in the extended version of this paper [10].…”
Section: Additional Use Casesmentioning
confidence: 99%
See 1 more Smart Citation