Scientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities

Cohen-Boulakia, Sarah; Belhajjame, Khalid; Collin, Olivier; Chopard, Jérôme; Froidevaux, Christine; Gaignard, Alban; Hinsen, Konrad; Larmande, Pierre; Bras, Yvan Le; Lemoine, Frédéric; Mareuil, Fabien; Ménager, Hervé; Pradal, Christophe; Blanchet, Christophe

doi:10.1016/j.future.2017.01.012

Cited by 138 publications

(109 citation statements)

References 68 publications

Supporting

Mentioning

109

Contrasting

Order By: Relevance

“…The literature provides a range of definitions for the reproducibility of in silico experiments by analogy to wet lab experiments [12,14,17,18,21,29,44]. Four levels of reproducibility are then commonly…”

Section: Reproducibility Of Computational Analyses 41 From Repeatabimentioning

confidence: 99%

The CoLoMoTo Interactive Notebook: Accessible and Reproducible Computational Analyses for Qualitative Biological Networks

Naldi

Hernandez

Levy

et al. 2018

Preprint

Self Cite

View full text Add to dashboard Cite

Analysing models of biological networks typically relies on workflows in which different software tools with sensitive parameters are chained together, many times with additional manual steps.The accessibility and reproducibility of such workflows is challenging, as publications often overlook analysis details, and because some of these tools may be difficult to install, and/or have a steep learning curve. The CoLoMoTo Interactive Notebook provides a unified environment to edit, execute, share, and reproduce analyses of qualitative models of biological networks. This framework combines the power of different technologies to ensure repeatability and to reduce users' learning curve of these technologies. The framework is distributed as a Docker image with the tools ready to be run without any installation step besides Docker, and is available on Linux, macOS, and Microsoft Windows. The embedded computational workflows are edited with a Jupyter web interface, enabling the inclusion of textual annotations, along with the explicit code to execute, as well as the visualisation of the results.The resulting notebook files can then be shared and re-executed in the same environment. To date, the CoLoMoTo Interactive Notebook provides access to software tools including GINsim, BioLQM, Pint, MaBoSS, and Cell Collective for the modelling and analysis of Boolean and multi-valued networks.More tools will be included in the future. We developed a Python interface for each of these tools to offer a seamless integration in the Jupyter web interface and ease the chaining of complementary analyses. Python programming languageRecently, the scientific community has been increasingly concerned about difficulties in reproducing already published results. In the context of preclinical studies, observed difficulties to reproduce important findings have raised controversy (see e.g. [7,15,40,43], and [8] for a review on this topic). Although not invalidating the findings, these observations have shaken the community. In 2016, a Nature survey pointed to the multi-factorial origin of this "reproducibility crisis" [4]. Factors related to computational analyses were highlighted, in particular the unavailability of code and methods, along with the technical expertise required to reproduce the computations.The scientific community is progressively addressing this problem. Prestigious conferences (such as two major conferences from the database community, namely, VLDB 1 and SIGMOD 2 ) and journals such as PNAS 3 , Biostatistics [38], Nature [41] and Science [54], to name only a few, now encourage or even require published results to be accompanied by all the information necessary to reproduce them. While the reproducibility challenges have first been observed in domains where deluge of data were quickly becoming available (e.g., Next Generation Sequencing data analyses), the problem is now present in many (if not all) communities where computational analyses and simulations are performed. In particular, the System Biology community is facing a pro...

show abstract

Section: Reproducibility Of Computational Analyses 41 From Repeatabimentioning

confidence: 99%

The CoLoMoTo Interactive Notebook: Accessible and Reproducible Computational Analyses for Qualitative Biological Networks

Naldi

Hernandez

Levy

et al. 2018

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…With the development of new experimental technologies, food microbiologists and risk assessors are now confronted with large datasets that are computationally analyzed for extracting the biological information of interest. Facing the statistical complexity of data analysis and the heterogeneity of available software tools, CohenBoulakia et al [39 ] argue that some scientific results will not stand the test of time. Indeed, no one will be able to reproduce results that are dependent of programmes that may not be maintained in the future.…”

Section: Transparency and Consistencymentioning

confidence: 99%

Towards transparent and consistent exchange of knowledge for improved microbiological food safety

Plaza‐Rodríguez

Haberbeck

Desvignes

et al. 2018

Current Opinion in Food Science

View full text Add to dashboard Cite

“…Developments in workflow management systems have led to the proposition of using workflowcentric research objects with executable components [13,34]. The use of workflow creation and management software allows researchers to utilize different resources to create complex analysis pipelines that can be executed locally, on institutional servers, and on the cloud [15,53]. Extensive reviews of current workflow systems for bioinformatics are linked [16,[53][54][55].…”

Section: Workflow Management Systemsmentioning

confidence: 99%

“…The use of workflow creation and management software allows researchers to utilize different resources to create complex analysis pipelines that can be executed locally, on institutional servers, and on the cloud [15,53]. Extensive reviews of current workflow systems for bioinformatics are linked [16,[53][54][55]. Ongoing systems participate in the current trend of moving from graphical system back to script-like workflows.…”

Section: Workflow Management Systemsmentioning

confidence: 99%

Enabling Precision Medicine via standard communication of HTS provenance, analysis, and results

Alterovitz

Dean

Goble

et al. 2017

Preprint

View full text Add to dashboard Cite

Abstract. Precision medicine can be empowered by a personalized approach to patient care based on the patient's unique genomic sequence. T o be used in precision medicine, genomic findings must be robust, reproducible, and experimental data capture should adhere to FAIR Data Guiding Principles. Moreover, precision medicine requires standardization that extends beyond wet lab procedures to computational methods.Rapidly developing standardization technologies improves communication of genomic sequencing by introducing concepts such as error domain, usability domain, validation kit, and provenance information. T hese advancements allow data provenance to be standardized and ensure interoperability. T hus, a resulting bioinformatics computation instance that includes these advancements can be easily communicated, repeated and compared by scientists, regulators, clinicians and others, allowing a greater range of practical applications.Advancing clinical trials, precision medicine, and regulatory submissions requires an umbrella of standards that not only fuses these elements, but also ensures efficient communication and documentation of genomic analyses. T hrough standardized bundling of HT S studies under an umbrella, regulatory agencies (FDA), academic researchers, and clinicians can expand collaboration to drive innovation in precision medicine with the potential for decreasing the time and cost associated with NGS workflow exchange, including FDA regulatory review submissions.

show abstract

Scientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities

Cited by 138 publications

References 68 publications

The CoLoMoTo Interactive Notebook: Accessible and Reproducible Computational Analyses for Qualitative Biological Networks

The CoLoMoTo Interactive Notebook: Accessible and Reproducible Computational Analyses for Qualitative Biological Networks

Towards transparent and consistent exchange of knowledge for improved microbiological food safety

Enabling Precision Medicine via standard communication of HTS provenance, analysis, and results

Contact Info

Product

Resources

About