Data Analytics and Management in Data Intensive Domains

Leonid,; Yannis,; Malkov, O. Yu.; Nikolay,; Stupnikov, Sergey A.; Владимир,

doi:10.1007/978-3-030-23584-0

Cited by 3 publications

(1 citation statement)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This together suggests that a deep interconnection of both approaches can be extremely fruitful, because concrete implementations of digital objectss will lead to data structures that implicitly comply to at least parts of the policies. The idea, to investigate this coupling more deeply and to describe FAIR Digital Objects (FDO) as digital objects that fulfill all FAIR principles, was beside others conducted by the GEDE Digital Object Topic Group of European Data Experts (GEDE 2019) and is also described in (Schultes 2018).…”

Section: The Fair Digital Object Framework the Fair Principlesmentioning

confidence: 99%

Digital Objects – FAIR Digital Objects: Which Services Are Required?

Schwardmann

2020

Data Science Journal

View full text Add to dashboard Cite

Some of the early Research Data Alliance working groups reused the notion of digital objects as digital entities described by metadata and referenced by a persistent identifier. In recent times the FAIR principles became a prominent role as framework for the sustainability of scientific data. Both approaches had always machine actionability, the capability of computational systems to use services on data without human intervention, in their focus. The more technical approach of digital objects turned out to provide a complementary view on several aspects of the policy framework of FAIR from a technical perspective. After a deeper analysis and integration of these concepts by a group of European data experts the discussion intensified on so called FAIR digital objects. But they need to be accompanied by services as building blocks for automated processes. We will describe the components of this framework and its potentials here, and also which services inside this framework are required. Necessary Abstractions in the Data DomainSeveral studies in relevant data analytic projects, for instance a survey of RDA Europe (RDA Europe 2019) from 2013, say that up to 80% of the time of experts working with data is wasted with data wrangling (i.e. making data ready for analytics). This suggests that only a high degree of automation based on simple structures can provide an alternative to this highly inefficient and error prone way of data handling.The major obstacle for automation is the heterogeneity and complexity of data and abstraction is a generic way to hide this heterogeneity and complexity by encapsulation and virtualization.By encapsulation details are hidden that are not needed at a specific layer. For instance at the data infrastructure layer there is no difference to be made between data, metadata, software, semantic assertions etc. All can be seen as some kind of data, for example as files in a filesystem, that is copied, changed or deleted. At that layer all operations do not distinguish between metadata and data, whereas on a data management and reuse layer a distinction is necessary and metadata must be used to govern the management operations on data.By virtualization one substitutes objects by their logical representation. The most abstract way of such a logical representation is the pointer that leads to the object, a classical and often used approach in Computer Science, hiding all complexity behind a pure reference to the object. With Virtual Machines for instance as another virtualization example one hides only the hardware, but still exposes most of the internal structure in the logical representation.

show abstract

Section: The Fair Digital Object Framework the Fair Principlesmentioning

confidence: 99%

Digital Objects – FAIR Digital Objects: Which Services Are Required?

Schwardmann

2020

Data Science Journal

View full text Add to dashboard Cite

show abstract

A Photovoltaic System Model Integrating FAIR Digital Objects and Ontologies

et al. 2023

View full text Add to dashboard Cite

Smart grids of the future will create and provide huge data volumes, which are subject to FAIR (Findable, Accessible, Interoperable, and Reusable) data management solutions when used within the scientific domain and for operation. FAIR Digital Objects (FDOs) provide access to (meta)data, and ontologies explicitly describe metadata as well as application data objects and domains. The present paper proposes a novel approach to integrate FAIR digital objects and ontologies as metadata models in order to support data access for energy researchers, energy research applications, operational applications and energy information systems. As the first example domain to be modeled using an ontology and to get integrated with FAIR digital objects, a photovoltaic (PV) system model is selected. For the given purpose, a discussion of existing energy ontologies shows the necessity to develop a new PV ontology. By integration of FDOs, this new PV ontology is introduced in the present paper. Furthermore, the concept of FDOs is integrated with the PV ontology in such a way that it allows for generalization. By this, the present paper contributes to a sustainable data management for smart grid operation, especially for interoperability, by using ontologies and, hence, unambiguous semantics. An information system application that visualizes the PV system, its describing data and collected sensor data, is proposed. As a proof of concept the details of the use case implementation are presented.

show abstract

Enhancing RDM in Galaxy by integrating RO-Crate

Geest¹,

Coppens²,

Soiland‐Reyes³

et al. 2022

RIO

View full text Add to dashboard Cite

We introduce how the Galaxy research environment (Jalili et al. 2020) integrates with RO-Crate as an implementation of Findable Accessible Interoperable Reproducible Digital Objects (FAIR Digital Objects / FDO) (Wilkinson et al. 2016, Schultes and Wittenburg 2018) and how using RO-Crate as an exchange mechanism of workflows and their execution history helps integrate Galaxy with the wider ecosystem of ELIXIR (Harrow et al. 2021) and the European Open Science Cloud (EOSC-Life) to enable FAIR and reproducible data analysis. RO-Crate (Soiland-Reyes et al. 2022) is a generic packaging format containing datasets and their description using standards for FAIR Linked Data. The format is based on schema.org (Guha et al. 2016) annotations in JSON-LD, which allows for rich metadata representation. The RO-Crate effort aims to make best-practice in formal metadata description accessible and practical for use in a wider variety of situations, from an individual researcher working with a folder of data, to large data-intensive computational research environments. The RO-Crate community brings together practitioners from very different backgrounds, and with different motivations and use cases. Among the core target users are: researchers engaged with computation and data-intensive, workflow-driven analysis; digital repository managers and infrastructure providers; individual researchers looking for a straightforward tool or how-to guide to “FAIRify” their data; data stewards supporting research projects in creating and curating datasets. researchers engaged with computation and data-intensive, workflow-driven analysis; digital repository managers and infrastructure providers; individual researchers looking for a straightforward tool or how-to guide to “FAIRify” their data; data stewards supporting research projects in creating and curating datasets. Given the wide applicability of RO-Crate and the lack of practical implementations of FDOs, ELIXIR (Harrow et al. 2021) co-opted this initiative as the project to define a common format for research data exchange and repository entries. Thus, during the last year it’s been implemented in a wide range of services, such as: WorkflowHub (Goble et al. 2021) (a registry for describing, sharing and publishing scientific computational workflows) uses RO-Crates as an exchange format to improve reproducibility of computational workflows that follow the Workflow RO-Crate profile (Bacall et al. 2022); LifeMonitor (Leo et al. 2022) (a service to support the sustainability of computational workflows being developed as part of the EOSC-Life project) uses RO-Crate as an exchange format for describing test suites associated with workflows. Tools have been developed towards aiding the previously mentioned use cases and increasing the general usability of RO-Crates by providing a user-friendly (programmatic) interface for consumption and production of RO-Crates through programmatic libraries for consuming/producing RO-Crates (ro-crate-py De Geest et al. 2022, ro-crate-ruby Bacall and Whitwell 2022, ro-crate-js Lynch et al. 2021). The Galaxy project provides a research environment with data analysis and data management functionalities as a multi user platform, aiming to make computational biology accessible to research scientists that do not have computer programming or systems administration experience. As such, it stores not just analysis related data but also the complete analytical workflow, including its metadata. The internal data model involves the history entity, including all steps performed in a specific analysis, and the workflow entity, defining the structure of an analytical pipeline. From the start, Galaxy aims to enable reproducible analyses by providing capabilities to export (and import) all the analysis history details and workflow data and metadata in a FAIR way. As such it helps its users with the daily research data management. The Galaxy community is continuously improving and adding features, the integration of the FAIR Digital Object principles is a natural next step in this. To be able to support these FDOs, Galaxy leverages the RO-Crate Python client library (De Geest et al. 2022) and provides multiple entry points to import and export different research data objects representing its internal entities and associated metadata. These objects include: a workflow definition, which is used to share/publish the details of an analysis pipeline, including the graph of tools that need to be executed, and metadata about the data types required individual data files or a collection of datasets related to an analysis history a compressed archive of the entire analysis history including the metadata associated with it such as the tools used, their versions, the parameters chosen, workflow invocation related metadata, inputs, outputs, license, author, CWLProv description (Khan et al. 2019) of the workflow, contextual references in the form of Digital Object Identifiers (DOIs), ‘EMBRACE Data And Methods’ ontology (EDAM) terms (Ison et al. 2013), etc. a workflow definition, which is used to share/publish the details of an analysis pipeline, including the graph of tools that need to be executed, and metadata about the data types required individual data files or a collection of datasets related to an analysis history a compressed archive of the entire analysis history including the metadata associated with it such as the tools used, their versions, the parameters chosen, workflow invocation related metadata, inputs, outputs, license, author, CWLProv description (Khan et al. 2019) of the workflow, contextual references in the form of Digital Object Identifiers (DOIs), ‘EMBRACE Data And Methods’ ontology (EDAM) terms (Ison et al. 2013), etc. The adoption of RO-crate by Galaxy allows a standardised exchange of FDOs with other platforms in the ELIXIR Tools ecosystem, such as WorkflowHub and LifeMonitor. Integrating RO-Crate deeply into Galaxy and offering import and export options of various Galaxy objects such as Research Objects allows for increased standardisation, improved Research Data Management (RDM) functionalities, smoother user experience (UX) as well as improved interoperability with other systems. The integration in a platform used by biologists to do data intensive analysis, facilitates the publication of workflows and workflow invocations for all skill levels and democratises the ability to perform Open Science.

show abstract

Data Analytics and Management in Data Intensive Domains

Cited by 3 publications

References 0 publications

Digital Objects – FAIR Digital Objects: Which Services Are Required?

Digital Objects – FAIR Digital Objects: Which Services Are Required?

A Photovoltaic System Model Integrating FAIR Digital Objects and Ontologies

Enhancing RDM in Galaxy by integrating RO-Crate

Contact Info

Product

Resources

About