Research Objects: Towards Exchange and Reuse of Digital Knowledge

Bechhofer, Sean; Roure, David De; Gamble, Matthew; Goble, Carole; Buchan, Iain

doi:10.1038/npre.2010.4626.1

Cited by 132 publications

(73 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…A number of proposals are emerging for modelling and managing Research Objects (RO) [8], which augment primary and derived data with ancillary elements, such as a description of the experimental process of data generation and transformation, publications based on the data products, and more, which can enable improved capabilities for data interpretation. In our work we focus on a particular type of descriptive metadata that may find its place in an RO, namely provenance traces, which describe the dependencies of data products obtained through a process consisting of a sequence of transformations, from other data elements that participated in the process, namely its inputs along with any intermediate results.…”

Section: Discussion Of Related Work and Conclusionmentioning

confidence: 99%

Linking multiple workflow provenance traces for interoperable collaborative science

Missier

Ludäscher

Bowers

et al. 2010

The 5th Workshop on Workflows in Support of Large-Scale Science

Self Cite

View full text Add to dashboard Cite

Abstract-Scientific collaboration increasingly involves data sharing between separate groups. We consider a scenario where data products of scientific workflows are published and then used by other researchers as inputs to their workflows. For proper interpretation, shared data must be complemented by descriptive metadata. We focus on provenance traces, a prime example of such metadata which describes the genesis and processing history of data products in terms of the computational workflow steps. Through the reuse of published data, virtual, implicitly collaborative experiments emerge, making it desirable to compose the independently generated traces into global ones that describe the combined executions as single, seamless experiments. We present a model for provenance sharing that realizes this holistic view by overcoming the various interoperability problems that emerge from the heterogeneity of workflow systems, data formats, and provenance models. At the heart lie (i) an abstract workflow and provenance model in which (ii) data sharing becomes itself part of the combined workflow. We then describe an implementation of our model that we developed in the context of the Data Observation Network for Earth (DataONE) project and that can "stitch together" traces from different Kepler and Taverna workflow runs. It provides a prototypical framework for seamless cross-system, collaborative provenance management and can be easily extended to include other systems. Our approach also opens the door to new ways of workflow interoperability not only through often elusive workflow standards but through shared provenance information from public repositories.

show abstract

Section: Discussion Of Related Work and Conclusionmentioning

confidence: 99%

Linking multiple workflow provenance traces for interoperable collaborative science

Missier

Ludäscher

Bowers

et al. 2010

The 5th Workshop on Workflows in Support of Large-Scale Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…The Research Object (RO) approach [16], [17] is a new direction in this research field. RO defines an extendable model, which aggregates a number of resources in a core or unit.…”

Section: State Of the Artmentioning

confidence: 99%

Classification of scientific workflows based on reproducibility analysis

Bánáti

Kacsuk

Kozlovszky

2016

2016 39th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)

View full text Add to dashboard Cite

-In the scientist's community one of the most vital challenges is the reproducibility of a workflow execution. The necessary parameters of the execution (we call them descriptors) can be external which depend on for example the computing infrastructure (grids, clusters and clouds), on third party resources or it can be internal which belong to the code of the workflow such as variables. Consequently, during the process of re-execution these parameters may change or become unavailable and finally they can prevent to reproduce the workflow. However in most cases the lack of the original parameters can be compensated by replacing, evaluating or simulating the value of the descriptors with some extra cost in order to make it reproducible. Our goal in this paper is to classify the scientific workflows based on the method and cost how they can become reproducible.

show abstract

“…SEAD tools are attempting to reduce the manual barrier to submission of data from scientists in the long tail, those that generate small but highly voluminous data sets. SEAD had adopted the notion of the Research Object (RO) [12] as the unit of preservation, and uses Komadu to track the lifecycle of the RO through derivation, revision, and reuse.…”

Section: Motivationmentioning

confidence: 99%

Komadu: A Capture and Visualization System for Scientific Data Provenance

Suriarachchi¹,

Zhou²,

Plale

2015

Journal of Open Research Software

View full text Add to dashboard Cite

IntroductionData provenance is information about the entities, activities and people who have effected some type of transformation on a data product through the product's lifecycle. Data provenance captured from scientific applications is a critical precursor to data sharing and reuse. For researchers wanting to repurpose and reuse data, it is a source of information about the lineage and attribution of the data and this is needed in order to establish trust in a data set. Data provenance has been shown useful in results validation, failure tracing, and reproducibility. The Komadu provenance capture system is standalone, meaning it is not coupled to or dependent upon any database management system, repository, or scientific workflow system. It provides an ingest API through which provenance notifications are fed into the system at high speeds, and a query API through which provenance information can be queried. The data model is both event oriented and graph oriented, in that graphs are pieced together in Komadu based on the events received from the environment.Komadu has its roots in the Karma [2] provenance capture system, an earlier version that complied with the OPM community standard [3] both for defining the type of provenance notifications that the system accepted, and for defining the format of the results. Komadu, on the other hand, supports the W3C PROV specification [1] which provides far richer types of relationships and has a more formal model for handling time than does OPM. Karma was additionally limited by assuming that every notification belonging to the same external activity shared a common global identifier that is shared across all components (services, methods etc.) of the external environment. This limitation was found to be severe in applications where provenance is not only captured at the application level, but also at in the larger environment where the application runs. Take for instance a distributed application running in PlanetLab [7] and running under Twister [8]; it is highly limiting to expect provenance events generated from the application, from PlanetLab, and from Twister to all have shared knowledge about any single global identifier. This limitation derives from Karma's early days where it tracked provenance for applications running within a single workflow system. Additionally, a researcher may be interested in tracking lineage starting from some data product or agent. Such scenarios are not supported by Karma.In this paper, we introduce Komadu [9] provenance capture and visualization system. Komadu is a complete redesign and reimplementation of Karma that supports new features while addressing the above mentioned limitations of Karma. The main contributions of Komadu are as follows. . Even though Komadu has been used most extensively in relation to scientific research, its interfaces are designed to collect and visualize provenance of any kind of application needing provenance.

show abstract

Research Objects: Towards Exchange and Reuse of Digital Knowledge

Cited by 132 publications

References 6 publications

Linking multiple workflow provenance traces for interoperable collaborative science

Linking multiple workflow provenance traces for interoperable collaborative science

Classification of scientific workflows based on reproducibility analysis

Komadu: A Capture and Visualization System for Scientific Data Provenance

Contact Info

Product

Resources

About