PRESISTANT: Data Pre-processing Assistant

Bilalli, Besim; Abelló, Alberto; Aluja‐Banet, Tomàs; Munir, Raina Faisal; Wrembel, Robert

doi:10.1007/978-3-319-92901-9_6

Cited by 8 publications

(10 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Over the last few years, a plethora of AutoML systems have been developed providing partial or complete ML automation, such as Auto-sklearn [7], TPOT [8],Auto-WEKA [9], ATM [10], as well as commercial systems such as Google AutoML 1 , RapidMiner 2 , Dar-winAI 3 , and DataRobo 4 . These tools range from automatic data preprocessing [19,20], automatic feature engineering [21,22] to automatic model selection [18,23] and automatic hyper-parameters tuning [24,25]. Some approaches attempt to automatically and simultaneously choose a learning algorithm and optimize its hyper-parameters.…”

Section: Automated Machine Learningmentioning

confidence: 99%

Towards big industrial data mining through explainable automated machine learning

Garouani

Ahmad

Bouneffa

et al. 2022

Int J Adv Manuf Technol

View full text Add to dashboard Cite

show abstract

Section: Automated Machine Learningmentioning

confidence: 99%

Towards big industrial data mining through explainable automated machine learning

Garouani

Ahmad

Bouneffa

et al. 2022

Int J Adv Manuf Technol

View full text Add to dashboard Cite

show abstract

“…The authors declare that they have no conflict of interest. [1,20] The minimal number of data points required in order to create a leaf. Min samples split [2,20] The minimal number of data points required to split an internal node.…”

Section: Declarationsmentioning

confidence: 99%

“…[1,20] The minimal number of data points required in order to create a leaf. Min samples split [2,20] The minimal number of data points required to split an internal node. imputation mean, median, mode Strategy for imputing missing numeric variables.…”

Section: Declarationsmentioning

confidence: 99%

Towards Big Industrial Data Mining Through Explainable Automated Machine Learning

Garouani

Ahmad

Bouneffa

et al. 2021

Preprint

View full text Add to dashboard Cite

Industrial systems resources are capable of producing large amount of data. These data are often in heterogeneous formats and distributed, yet they provide means to mine the information which can allow the deployment of intelligent management tools for production activities. For this purpose, it is necessary to be able to implement knowledge extraction and prediction processes using Artificial Intelligence(AI) models but the selection and configuration of intended AI models tend to be increasingly complex for a non-expert user. In this paper, we present an approach and a software platform that may allow industrial actors, who are usually not familiar with AI, to select and configure algorithms optimally adapted to their needs. Hence, the approach is essentially based on automated machine learning. The resulting platform effectively enables a better choice among the combination of AI algorithms and hyper-parameter configurations. It also makes it possible to provide features of explainability of the resulting algorithms and models, thus increasing the acceptability of these models in practicing community of the users. The proposed approach has been applied in the field of predictive maintenance. Current tests are based on the analysis of more than 360 databases from the subjected field.

show abstract

“…Importantly, notice that while Quarry is being applied in a real world project, for data privacy reasons, the data exemplifying the Quarry's functionalities are either publicly available (i.e., published by United Nations) or synthetically generated to simulate real data internal to WHO. global WHO Integrated Data Platform (WIDP) 5 , powered by District Health Information System 2 (DHIS2) 6 . Data coming from WIDP register either individual events (e.g., patience diagnosis and treatment, dwellings inspection) or collective reports (e.g., total number of infected patients over a period of time and at a certain territory, total number of administered drugs).…”

Section: Dataspacesmentioning

confidence: 99%

“…The latter however, is required when the Data Analyst already wants to apply some transformations for the sake of improving the results of his analysis. To this end, in the Data preprocessing component, by learning from historical knowledge and knowing the type of analysis (e.g., supervised learning problem), Quarry is able to recommend additional complex transformations (e.g., feature selection) that have potentially positive impact on the final analysis [5] (e.g., increase the predictive accuracy of the supervised learning).…”

Section: Big Data Integration Workflowsmentioning

confidence: 99%

Quarry: A User-centered Big Data Integration Platform

et al. 2020

Self Cite

View full text Add to dashboard Cite

Obtaining valuable insights and actionable knowledge from data requires cross-analysis of domain data coming typically from various sources. Doing so, inevitably imposes burdensome processes of unifying different data formats, discovering integration paths, and all this given specific analytical needs of a data analyst. Along with large volumes of data, the variety of formats, data models, and semantics drastically contribute to the complexity of such processes. Although there have been many attempts to automate various processes along the Big Data pipeline, no unified platforms accessible by users without technical skills (like statisticians or business analysts) have been proposed.In this paper, we present a Big Data integration platform (Quarry) that uses hypergraph-based metadata to facilitate (and largely automate) the integration of domain data coming from a variety of sources and provides an intuitive interface to assist end users both in: (1) data exploration with the goal of discovering potentially relevant analysis facets, and (2) consolidation and deployment of data flows which integrate the data, and prepare them for further analysis (descriptive or predictive), visualization, and/or publishing. We validate Quarry's functionalities with the use case of World Health Organization (WHO) epidemiologists and data analysts in their fight against Neglected Tropical Diseases (NTDs).

show abstract

PRESISTANT: Data Pre-processing Assistant

Abstract: The user has requested enhancement of the downloaded file.

Cited by 8 publications

References 17 publications

Towards big industrial data mining through explainable automated machine learning

Towards big industrial data mining through explainable automated machine learning

Towards Big Industrial Data Mining Through Explainable Automated Machine Learning

Quarry: A User-centered Big Data Integration Platform

Contact Info

Product

Resources

About