Dark Data as the New Challenge for Big Data Science and the Introduction of the Scientific Data Officer

Schembera, Björn; Durán, Juan Manuel

doi:10.1007/s13347-019-00346-x

Cited by 45 publications

(29 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The concept of e!DAL to expose even dark [ 43 ] and semi-structured research data is also applied to metadata. They are divided into technical metadata, which are stored within e!DAL, and specific semantic metadata.…”

Section: Resultsmentioning

confidence: 99%

The on-premise data sharing infrastructure e!DAL: Foster FAIR data for faster data acquisition

et al. 2020

View full text Add to dashboard Cite

Background The FAIR data principle as a commitment to support long-term research data management is widely accepted in the scientific community. Although the ELIXIR Core Data Resources and other established infrastructures provide comprehensive and long-term stable services and platforms for FAIR data management, a large quantity of research data is still hidden or at risk of getting lost. Currently, high-throughput plant genomics and phenomics technologies are producing research data in abundance, the storage of which is not covered by established core databases. This concerns the data volume, e.g., time series of images or high-resolution hyper-spectral data; the quality of data formatting and annotation, e.g., with regard to structure and annotation specifications of core databases; uncovered data domains; or organizational constraints prohibiting primary data storage outside institional boundaries. Results To share these potentially dark data in a FAIR way and master these challenges the ELIXIR Germany/de.NBI service Plant Genomic and Phenomics Research Data Repository (PGP) implements a “bring the infrastructure to the data” approach, which allows research data to be kept in place and wrapped in a FAIR-aware software infrastructure. This article presents new features of the e!DAL infrastructure software and the PGP repository as a best practice on how to easily set up FAIR-compliant and intuitive research data services. Furthermore, the integration of the ELIXIR Authentication and Authorization Infrastructure (AAI) and data discovery services are introduced as means to lower technical barriers and to increase the visibility of research data. Conclusion The e!DAL software matured to a powerful and FAIR-compliant infrastructure, while keeping the focus on flexible setup and integration into existing infrastructures and into the daily research process.

show abstract

Section: Resultsmentioning

confidence: 99%

The on-premise data sharing infrastructure e!DAL: Foster FAIR data for faster data acquisition

et al. 2020

View full text Add to dashboard Cite

show abstract

“…[±] Auditing is integral as acquisition scales up to Big Data . The process of managing what Schembera and Durán ( 2020 ), describes as “tangible data” can be extremely time-consuming and costly for those involved and human or machine error can propagate, resulting in biases or leading to mostly unusable data (L'heureux et al, 2017 ). On the other side, is the auditing of “dark data.” This data type is estimated to be 90% (Johnson, 2015 ) of all stored data, and is largely unknown to the user.…”

Section: Methodology: Ethical Data Considerationsmentioning

confidence: 99%

Considerations for a More Ethical Approach to Data in AI: On Data Representation and Infrastructure

Baird

Schuller

2020

Front. Big Data

View full text Add to dashboard Cite

Data shapes the development of Artificial Intelligence (AI) as we currently know it, and for many years centralized networking infrastructures have dominated both the sourcing and subsequent use of such data. Research suggests that centralized approaches result in poor representation, and as AI is now integrated more in daily life, there is a need for efforts to improve on this. The AI research community has begun to explore managing data infrastructures more democratically, finding that decentralized networking allows for more transparency which can alleviate core ethical concerns, such as selection-bias. With this in mind, herein, we present a mini-survey framed around data representation and data infrastructures in AI. We outline four key considerations (auditing, benchmarking, confidence and trust, explainability and interpretability) as they pertain to data-driven AI, and propose that reflection of them, along with improved interdisciplinary discussion may aid the mitigation of data-based AI ethical concerns, and ultimately improve individual wellbeing when interacting with AI.

show abstract

“…This means that such a role has the responsibility of supporting metadata annotation, building metadata models and checking the data inventory for unindexed data. Such a role is, for example, the Scientific Data Officer [9]. • Incentives.…”

Section: Metadata Processesmentioning

confidence: 99%

“…In contrast to DaRUS, the Novel Materials Discovery (NOMAD) laboratory 9 (or Novel Materials Discovery Center of Excellence (NOMAD CoE)) is a prime example of a domain-specific data infrastructure which is highly integrated [20] in a virtual research environment. The repository part is complemented with the NOMAD Archive, the NOMAD Encylopedia, the NOMAD Visualization Tools and the NOMAD Analytics Toolkit.…”

Section: Nomadmentioning

confidence: 99%

Research Data Infrastructures and Engineering Metadata

Horsch

Chiacchiera

Cavalcanti

et al. 2021

SpringerBriefs in Applied Sciences and Technology

Self Cite

View full text Add to dashboard Cite

This chapter introduces metadata models as a semantic technology for knowledge representation to describe selected aspects of a research asset. The process of building a hierarchical metadata model is reenacted in this chapter and highlighted by the example of EngMeta. Moreover, an overview on data infrastructures is given, the general architecture and functions are disscussed, and multiple examples of data infrastructures in materials modelling are given.

show abstract

Dark Data as the New Challenge for Big Data Science and the Introduction of the Scientific Data Officer

Cited by 45 publications

References 20 publications

The on-premise data sharing infrastructure e!DAL: Foster FAIR data for faster data acquisition

The on-premise data sharing infrastructure e!DAL: Foster FAIR data for faster data acquisition

Considerations for a More Ethical Approach to Data in AI: On Data Representation and Infrastructure

Research Data Infrastructures and Engineering Metadata

Contact Info

Product

Resources

About