2018
DOI: 10.1007/978-3-319-92901-9_6
|View full text |Cite
|
Sign up to set email alerts
|

PRESISTANT: Data Pre-processing Assistant

Abstract: The user has requested enhancement of the downloaded file.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
1
1

Relationship

2
4

Authors

Journals

citations
Cited by 8 publications
(10 citation statements)
references
References 17 publications
0
10
0
Order By: Relevance
“…Over the last few years, a plethora of AutoML systems have been developed providing partial or complete ML automation, such as Auto-sklearn [7], TPOT [8],Auto-WEKA [9], ATM [10], as well as commercial systems such as Google AutoML 1 , RapidMiner 2 , Dar-winAI 3 , and DataRobo 4 . These tools range from automatic data preprocessing [19,20], automatic feature engineering [21,22] to automatic model selection [18,23] and automatic hyper-parameters tuning [24,25]. Some approaches attempt to automatically and simultaneously choose a learning algorithm and optimize its hyper-parameters.…”
Section: Automated Machine Learningmentioning
confidence: 99%
“…Over the last few years, a plethora of AutoML systems have been developed providing partial or complete ML automation, such as Auto-sklearn [7], TPOT [8],Auto-WEKA [9], ATM [10], as well as commercial systems such as Google AutoML 1 , RapidMiner 2 , Dar-winAI 3 , and DataRobo 4 . These tools range from automatic data preprocessing [19,20], automatic feature engineering [21,22] to automatic model selection [18,23] and automatic hyper-parameters tuning [24,25]. Some approaches attempt to automatically and simultaneously choose a learning algorithm and optimize its hyper-parameters.…”
Section: Automated Machine Learningmentioning
confidence: 99%
“…The authors declare that they have no conflict of interest. [1,20] The minimal number of data points required in order to create a leaf. Min samples split [2,20] The minimal number of data points required to split an internal node.…”
Section: Declarationsmentioning
confidence: 99%
“…[1,20] The minimal number of data points required in order to create a leaf. Min samples split [2,20] The minimal number of data points required to split an internal node. imputation mean, median, mode Strategy for imputing missing numeric variables.…”
Section: Declarationsmentioning
confidence: 99%
“…Importantly, notice that while Quarry is being applied in a real world project, for data privacy reasons, the data exemplifying the Quarry's functionalities are either publicly available (i.e., published by United Nations) or synthetically generated to simulate real data internal to WHO. global WHO Integrated Data Platform (WIDP) 5 , powered by District Health Information System 2 (DHIS2) 6 . Data coming from WIDP register either individual events (e.g., patience diagnosis and treatment, dwellings inspection) or collective reports (e.g., total number of infected patients over a period of time and at a certain territory, total number of administered drugs).…”
Section: Dataspacesmentioning
confidence: 99%
“…The latter however, is required when the Data Analyst already wants to apply some transformations for the sake of improving the results of his analysis. To this end, in the Data preprocessing component, by learning from historical knowledge and knowing the type of analysis (e.g., supervised learning problem), Quarry is able to recommend additional complex transformations (e.g., feature selection) that have potentially positive impact on the final analysis [5] (e.g., increase the predictive accuracy of the supervised learning).…”
Section: Big Data Integration Workflowsmentioning
confidence: 99%