2020
DOI: 10.1007/s10844-020-00608-7
|View full text |Cite
|
Sign up to set email alerts
|

On data lake architectures and metadata management

Abstract: Over the past two decades, we have witnessed an exponential increase of data production in the world. So-called big data generally come from transactional systems, and even more so from the Internet of Things and social media. They are mainly characterized by volume, velocity, variety and veracity issues. Big data-related issues strongly challenge traditional data management and analysis systems. The concept of data lake was introduced to address them. A data lake is a large, raw data repository that stores an… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
84
0
7

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 150 publications
(91 citation statements)
references
References 35 publications
0
84
0
7
Order By: Relevance
“…There is a demand for stable and usually web-accessible storage [90,144,147]. Data Lakes ingest raw data in its original format from various data sources, meet their role as storage repositories, and allow users to query and explore them to extract knowledge [167]. In the Extract, Transform, Load (ETL) standard procedure, "Load" is close to "store, " and this procedure performs a critical role in data storage [162].…”
Section: Analysis Phase Key Functionsmentioning
confidence: 99%
“…There is a demand for stable and usually web-accessible storage [90,144,147]. Data Lakes ingest raw data in its original format from various data sources, meet their role as storage repositories, and allow users to query and explore them to extract knowledge [167]. In the Extract, Transform, Load (ETL) standard procedure, "Load" is close to "store, " and this procedure performs a critical role in data storage [162].…”
Section: Analysis Phase Key Functionsmentioning
confidence: 99%
“…A data lake is a large, raw data repository that stores and manages the company data bearing any format. The concept of data lake was introduced in the last decade in order to address issues related to processing big data [4]. Moreover, recently, the semantic data lakes [5] are introduced as an extension of the data lake supplying it with a semantic middleware, which allows the uniform access to original heterogeneous data sources.…”
Section: Fig 1 Modern Data Ecosystemmentioning
confidence: 99%
“…Delta Lake is a storage layer for improving the reliability of data lakes [36][37][38]. Delta Lake can operate on the basis of implemented data lakes using Apache Hadoop [11], Amazon S3 [32] or Azure Data Lake Storage [39].…”
Section: Delta Lakementioning
confidence: 99%