2018 IEEE 14th International Conference on E-Science (E-Science) 2018
DOI: 10.1109/escience.2018.00040
|View full text |Cite
|
Sign up to set email alerts
|

Skluma: An Extensible Metadata Extraction Pipeline for Disorganized Data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 22 publications
0
6
0
Order By: Relevance
“…These are flattened to Neo4j graph structures with extensible metadata management in the data lake, categorizing for kinds of data: raw data, metadata, additional semantics, and the data fragment identifiers. Skluma [136] GEMMS [64], [117] HANDLE [43] Data vault [57], [107] Diamantiniet al [34], [35], [36] Aurum [48] Ingestion Metadata modeling Sawadogoet et al [127] GOODS [67], [68] DS-Prox [3], [4], [5] KAYAK [90], [91] Nargesian et al [104] Ronin [110] Dataset organization…”
Section: Single Data Storementioning
confidence: 99%
See 1 more Smart Citation
“…These are flattened to Neo4j graph structures with extensible metadata management in the data lake, categorizing for kinds of data: raw data, metadata, additional semantics, and the data fragment identifiers. Skluma [136] GEMMS [64], [117] HANDLE [43] Data vault [57], [107] Diamantiniet al [34], [35], [36] Aurum [48] Ingestion Metadata modeling Sawadogoet et al [127] GOODS [67], [68] DS-Prox [3], [4], [5] KAYAK [90], [91] Nargesian et al [104] Ronin [110] Dataset organization…”
Section: Single Data Storementioning
confidence: 99%
“…6.4, we further address extracting hidden metadata such as functional dependencies. Given semi-structured or unstructured data, existing approaches [53], [61], [117] extract primarily structural metadata, while the one in [136] extracts metadata related to content and context.…”
Section: Metadata Extractionmentioning
confidence: 99%
“…The Hadoop Distributed File System (HDFS) is one of the most frequently mentioned data storage systems for data lakes [21,119,13]. HDFS supports a wide range of DATAMARAN [48] Skluma [118] Metadata modeling Generic metadata model (GEMMS [108], Constance [59]) Data vault [98,52] Graph-based metadata model (Diamantiniet al [32,33], Aurum [43], Sawadogoet et al [113])…”
Section: File-based Storage Systemsmentioning
confidence: 99%
“…For scientific data files, Skluma [118] extracts JSON metadata for content and context of scientific data files. It first finds the name, path, size, and extension of the files; then it infers file types and adds specific extractors accordingly to process tabular data, free texts or null values, etc.…”
Section: Metadata Extractionmentioning
confidence: 99%
“…However, despite the simpler API for metadata storage, an inherited limitation of key-value store is; lack of multiattribute or multi-dimensional search queries, which is a common trend in scientific applications and communities [2], [27], [30], [46]. Scientific data is largely unstructured and contains a lot of descriptive metadata in the form of keyvalue pair attributes [26], and retrieving the desired dataset usually depends on such multiple metadata attributes [30], [47]. Besides, such retrieval often includes additional tags and annotations in search query provided by scientists and applications [2], [26].…”
Section: ) Object Sharing Controlsmentioning
confidence: 99%