Restoring reproducibility of Jupyter notebooks

Wang, Jiawei; Kuo, Tzu-yang; Li, Li; Zeller, Andreas

doi:10.1145/3377812.3390803

Cited by 16 publications

(6 citation statements)

References 19 publications

(21 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For instance, [ 34 ], [ 35 ], and [ 36 ] laid out principles for reproducible computational research in general. In a similar vein, [ 37 ] and [ 38 ] looked at specifics of computational reproducibility in the life sciences, [ 39 ] explored the use of Docker—a containerization tool—in reproducibility contexts, and [ 40 ] looked at the reproducibility of R scripts archived in an institutional repository, while [ 41 ], [ 42 ], as well as [ 43 ], [ 44 ], and [ 45 ] zoomed in on Jupyter notebooks, a popular file format for documenting and sharing computational workflows. Though most of these guiding documents are language agnostic, language-specific approaches to computational reproducibility have also been outlined, for example, for Python [ 46 ].…”

Section: Introductionmentioning

confidence: 99%

Computational reproducibility of Jupyter notebooks from biomedical publications

Samuel,

Mietchen

2024

GigaScience

View full text Add to dashboard Cite

Background Jupyter notebooks facilitate the bundling of executable code with its documentation and output in one interactive environment, and they represent a popular mechanism to document and share computational workflows, including for research publications. The reproducibility of computational aspects of research is a key component of scientific reproducibility but has not yet been assessed at scale for Jupyter notebooks associated with biomedical publications. Approach We address computational reproducibility at 2 levels: (i) using fully automated workflows, we analyzed the computational reproducibility of Jupyter notebooks associated with publications indexed in the biomedical literature repository PubMed Central. We identified such notebooks by mining the article’s full text, trying to locate them on GitHub, and attempting to rerun them in an environment as close to the original as possible. We documented reproduction success and exceptions and explored relationships between notebook reproducibility and variables related to the notebooks or publications. (ii) This study represents a reproducibility attempt in and of itself, using essentially the same methodology twice on PubMed Central over the course of 2 years, during which the corpus of Jupyter notebooks from articles indexed in PubMed Central has grown in a highly dynamic fashion. Results Out of 27,271 Jupyter notebooks from 2,660 GitHub repositories associated with 3,467 publications, 22,578 notebooks were written in Python, including 15,817 that had their dependencies declared in standard requirement files and that we attempted to rerun automatically. For 10,388 of these, all declared dependencies could be installed successfully, and we reran them to assess reproducibility. Of these, 1,203 notebooks ran through without any errors, including 879 that produced results identical to those reported in the original notebook and 324 for which our results differed from the originally reported ones. Running the other notebooks resulted in exceptions. Conclusions We zoom in on common problems and practices, highlight trends, and discuss potential improvements to Jupyter-related workflows associated with biomedical publications.

show abstract

Section: Introductionmentioning

confidence: 99%

Computational reproducibility of Jupyter notebooks from biomedical publications

Samuel,

Mietchen

2024

GigaScience

View full text Add to dashboard Cite

show abstract

“…Jupyter Notebooks are the "de-facto standard" for data scientists [14], and much has been learned about how reproducible they are, the quality of the code written in them, and the narratives that describe the analyses within them [15,18,21,22]. These studies agree that notebook code is frequently low-quality and error-prone.…”

Section: Introductionmentioning

confidence: 99%

Error Identification Strategies for Python Jupyter Notebooks

Robinson,

Ernst,

Vargas

et al. 2022

Preprint

View full text Add to dashboard Cite

Computational notebooks-such as Jupyter or Colab-combine text and data analysis code. They have become ubiquitous in the world of data science and exploratory data analysis. Since these notebooks present a different programming paradigm than conventional IDE-driven programming, it is plausible that debugging in computational notebooks might also be different. More specifically, since creating notebooks blends domain knowledge, statistical analysis, and programming, the ways in which notebook users find and fix errors in these different forms might be different. In this paper, we present an exploratory, observational study into how notebook users find and understand potential errors in notebooks. We presented users with notebooks pre-populated with common notebook errors-errors rooted in either the statistical data analysis, the knowledge of domain concepts, or in the programming. We then analyzed the strategies our study participants used to find these errors and determined how successful each strategy was at identifying errors. Our findings indicate that while the notebook programming environment is different from the environments used for traditional programming, debugging strategies remain quite similar. It is our hope that the insights presented in this paper will help both notebook tool designers and educators make changes to improve how data scientists discover errors more easily in the notebooks they write.

show abstract

“…The growing popularity of computational notebooks has attracted the interest of many researchers (e.g., [60,72,73]). Besides the clear advantages of providing a narrative through code cells interleaved with inline documentation, many pitfalls and downsides have been identified when using notebooks [13,25], particularly in collaborative environments or when they are developed as production-level artifacts within AI-based systems [26,36,41,53,59].…”

Section: Introductionmentioning

confidence: 99%

“…Wang et al [71] describe this collaborative scenario as a 'scatter-gather' process where data scientists switch repeatedly between individual exploratory analyses (scatter) and insights discussion sessions (gather) until project completion. As such, here data scientists should make sure to fully leverage the self-documenting nature of computational notebooks, by writing exhaustive explanations of their work in natural language text [60] and checking that their computation can be re-executed without errors and fully reproduced by other colleagues [53,72].…”

Section: Introductionmentioning

confidence: 99%

Eliciting Best Practices for Collaboration with Computational Notebooks

Quaranta,

Calefato,

Lanubile

2022

Preprint

View full text Add to dashboard Cite

Despite the widespread adoption of computational notebooks, little is known about best practices for their usage in collaborative contexts. In this paper, we fill this gap by eliciting a catalog of best practices for collaborative data science with computational notebooks. With this aim, we first look for best practices through a multivocal literature review. Then, we conduct interviews with professional data scientists to assess their awareness of these best practices. Finally, we assess the adoption of best practices through the analysis of 1,380 Jupyter notebooks retrieved from the Kaggle platform. Findings reveal that experts are mostly aware of the best practices and tend to adopt them in their daily work. Nonetheless, they do not consistently follow all the recommendations as, depending on specific contexts, some are deemed unfeasible or counterproductive due to the lack of proper tool support. As such, we envision the design of notebook solutions that allow data scientists not to have to prioritize exploration and rapid prototyping over writing code of quality.CCS Concepts: • Human-centered computing → Empirical studies in collaborative and social computing; • Software and its engineering → Collaboration in software development.

show abstract

Restoring reproducibility of Jupyter notebooks

Cited by 16 publications

References 19 publications

Computational reproducibility of Jupyter notebooks from biomedical publications

Computational reproducibility of Jupyter notebooks from biomedical publications

Error Identification Strategies for Python Jupyter Notebooks

Eliciting Best Practices for Collaboration with Computational Notebooks

Contact Info

Product

Resources

About