Automated Data Pre-processing via Meta-learning

Bilalli, Besim; Abelló, Alberto; Aluja‐Banet, Tomàs; Wrembel, Robert

doi:10.1007/978-3-319-45547-1_16

Cited by 13 publications

(4 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Over the last few years, a plethora of AutoML systems have been developed providing partial or complete ML automation, such as Auto-sklearn [7], TPOT [8],Auto-WEKA [9], ATM [10], as well as commercial systems such as Google AutoML 1 , RapidMiner 2 , Dar-winAI 3 , and DataRobo 4 . These tools range from automatic data preprocessing [19,20], automatic feature engineering [21,22] to automatic model selection [18,23] and automatic hyper-parameters tuning [24,25]. Some approaches attempt to automatically and simultaneously choose a learning algorithm and optimize its hyper-parameters.…”

Section: Automated Machine Learningmentioning

confidence: 99%

Towards big industrial data mining through explainable automated machine learning

Garouani

Ahmad

Bouneffa

et al. 2022

Int J Adv Manuf Technol

View full text Add to dashboard Cite

show abstract

Section: Automated Machine Learningmentioning

confidence: 99%

Towards big industrial data mining through explainable automated machine learning

Garouani

Ahmad

Bouneffa

et al. 2022

Int J Adv Manuf Technol

View full text Add to dashboard Cite

show abstract

“…Otherwise, in the worst case, all the transformations may end up having the same impact. For instance, as a first approach we considered simple regression trees [24] as meta-learners, and they suffer from this problem. Their limitation is that they contain a discrete number of leaves, and hence a discrete number of possible predictions.…”

Section: Meta-learnermentioning

confidence: 99%

Intelligent assistance for data pre-processing

Bilalli

Abelló

Aluja‐Banet

et al. 2018

Computer Standards & Interfaces

Self Cite

View full text Add to dashboard Cite

A data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around. Typically, a dataset needs to be pre-processed before being mined. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives. As a consequence, non-experienced users become overwhelmed with pre-processing alternatives. In this paper, we show that the problem can be addressed by automating the pre-processing with the support of meta-learning. To this end, we analyzed a wide range of data pre-processing techniques and a set of classification algorithms. For each classification algorithm that we consider and a given dataset, we are able to automatically suggest the transformations that improve the quality of the results of the algorithm on the dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.Postprint (author's final draft

show abstract

“…However, in our previous works [2,4,5], we showed that meta-learning can also be used to provide support specifically in the pre-processing step. This can be done by learning the impact of data pre-processing operators on the final result of the analysis.…”

Section: Meta-learning For Data Pre-processingmentioning

confidence: 99%