2019
DOI: 10.1007/s13347-019-00346-x
|View full text |Cite
|
Sign up to set email alerts
|

Dark Data as the New Challenge for Big Data Science and the Introduction of the Scientific Data Officer

Abstract: Many studies in big data focus on the uses of data available to researchers, leaving without treatment data that is on the servers but of which researchers are unaware. We call this dark data, and in this article, we present and discuss it in the context of high-performance computing (HPC) facilities. To this end, we provide statistics of a major HPC facility in Europe, the High-Performance Computing Center Stuttgart (HLRS). We also propose a new position tailor-made for coping with dark data and general data … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
28
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
4

Relationship

3
5

Authors

Journals

citations
Cited by 45 publications
(29 citation statements)
references
References 20 publications
0
28
0
1
Order By: Relevance
“…The concept of e!DAL to expose even dark [ 43 ] and semi-structured research data is also applied to metadata. They are divided into technical metadata, which are stored within e!DAL, and specific semantic metadata.…”
Section: Resultsmentioning
confidence: 99%
“…The concept of e!DAL to expose even dark [ 43 ] and semi-structured research data is also applied to metadata. They are divided into technical metadata, which are stored within e!DAL, and specific semantic metadata.…”
Section: Resultsmentioning
confidence: 99%
“…[±] Auditing is integral as acquisition scales up to Big Data . The process of managing what Schembera and Durán ( 2020 ), describes as “tangible data” can be extremely time-consuming and costly for those involved and human or machine error can propagate, resulting in biases or leading to mostly unusable data (L'heureux et al, 2017 ). On the other side, is the auditing of “dark data.” This data type is estimated to be 90% (Johnson, 2015 ) of all stored data, and is largely unknown to the user.…”
Section: Methodology: Ethical Data Considerationsmentioning
confidence: 99%
“…This means that such a role has the responsibility of supporting metadata annotation, building metadata models and checking the data inventory for unindexed data. Such a role is, for example, the Scientific Data Officer [9]. • Incentives.…”
Section: Metadata Processesmentioning
confidence: 99%
“…In contrast to DaRUS, the Novel Materials Discovery (NOMAD) laboratory 9 (or Novel Materials Discovery Center of Excellence (NOMAD CoE)) is a prime example of a domain-specific data infrastructure which is highly integrated [20] in a virtual research environment. The repository part is complemented with the NOMAD Archive, the NOMAD Encylopedia, the NOMAD Visualization Tools and the NOMAD Analytics Toolkit.…”
Section: Nomadmentioning
confidence: 99%