Towards Intelligent Data Analysis: The Metadata Challenge

Bilalli, Besim; Abelló, Alberto; Aluja‐Banet, Tomàs; Wrembel, Robert

doi:10.5220/0005876203310338

Cited by 17 publications

(13 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Inter-metadata is classified into dataset containment, provenance, logical cluster and content similarity by the author of [9]. Intra-metadata is classified into data characteristics, definitional, navigational, activity, lineage, rating and assessment [2,6,30]. The second classification is evolved, but it can still be improved.…”

Section: Metadatamentioning

confidence: 99%

See 1 more Smart Citation

Data Lakes: Trends and Perspectives

Ravat

Yan

2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

As a relatively new concept, data lake has neither a standard definition nor an acknowledged architecture. Thus, we study the existing work and propose a complete definition and a generic and extensible architecture of data lake. What's more, we introduce three future research axes in connection with our health-care Information Technology (IT) activities. They are related to (i) metadata management that consists of intra-and inter-metadata, (ii) a unified ecosystem for companies' data warehouses and data lakes and (iii) data lake governance.

show abstract

Section: Metadatamentioning

confidence: 99%

“…And Content similarity which means that different datasets share the same attributes. -For intra-metadata [28], we retain data characteristics, definitional, navigational and lineage metadata proposed in [2] and add the access, quality and security metadata.…”

Section: Metadatamentioning

confidence: 99%

Data Lakes: Trends and Perspectives

Ravat

Yan

2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Metadata. In our previous work [3], we studied and classified all types of metadata that can be used by systems that intelligently support the user in the different steps of the data analytics process. PRESISTANT, considers: 1) 54 dataset characteristics consisting of different summary characteristics (e.g., number of instances, dimensionality, class entropy, mean attribute entropy, etc.)…”

Section: Architecture and Implementationmentioning

confidence: 99%

PRESISTANT: Data Pre-processing Assistant

Bilalli

Abelló

Aluja‐Banet

et al. 2018

Lecture Notes in Business Information Processing

Self Cite

View full text Add to dashboard Cite

show abstract

“…In our previous work [1], we studied and classified all types of metadata that can be used by systems that intelligently support the user during the process of data analysis. These systems may vary in terms of the methodology they follow (e.g., case based reasoning, planning systems, etc.)…”

Section: Meta-learning For Data Pre-processingmentioning

confidence: 99%

Automated Data Pre-processing via Meta-learning

Bilalli

Abelló

Aluja‐Banet

et al. 2016

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Abstract. A data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around. As a matter of fact, a dataset usually needs to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and nonexperienced users become overwhelmed. We show that this problem can be addressed by an automated approach, leveraging ideas from metalearning. Specifically, we consider a wide range of data pre-processing techniques and a set of data mining algorithms. For each data mining algorithm and selected dataset, we are able to predict the transformations that improve the result of the algorithm on the respective dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.

show abstract

Towards Intelligent Data Analysis: The Metadata Challenge

Cited by 17 publications

References 10 publications

Data Lakes: Trends and Perspectives

Data Lakes: Trends and Perspectives

PRESISTANT: Data Pre-processing Assistant

Automated Data Pre-processing via Meta-learning

Contact Info

Product

Resources

About