Computational notebooks have emerged as the platform of choice for data science and analytical workflows, enabling rapid iteration and exploration. By keeping intermediate program state in memory and segmenting units of execution into so-called "cells", notebooks allow users to enjoy particularly tight feedback. However, as cells are added, removed, reordered, and rerun, this hidden intermediate state accumulates, making execution behavior difficult to reason about, and leading to errors and lack of reproducibility. We present nbsafety, a custom Jupyter kernel that uses runtime tracing and static analysis to automatically manage lineage associated with cell execution and global notebook state. nbsafety detects and prevents errors that users make during unaided notebook interactions, all while preserving the flexibility of existing notebook semantics. We evaluate nbsafety's ability to prevent erroneous interactions by replaying and analyzing 666 real notebook sessions. Of these, nbsafety identified 117 sessions with potential safety errors, and in the remaining 549 sessions, the cells that nbsafety identified as resolving safety issues were more than 7X more likely to be selected by users for re-execution compared to a random baseline, even though the users were not using nbsafety and were therefore not influenced by its suggestions.
Exploratory data analysis is a crucial part of data-driven scientific discovery. Yet, the process of discovering insights from visualization can be a manual and painstaking process. This article discusses some of the lessons we learned from working with scientists in designing visual data exploration system, along with design considerations for future tools.
In this paper, we introduce Crowdclass, a novel framework that integrates the learning of advanced scientific concepts with the crowdsourcing microtask of image classification. In Crowdclass, we design questions to serve as both a learning experience and a scientific classification. This is different from conventional citizen science platforms which decompose high level questions into a series of simple microtasks that require no scientific background knowledge to complete. We facilitate learning within the microtask by providing content that is appropriate for the participant’s level of knowledge through scaffolding learning. We conduct a between-group study of 93 participants on Amazon Mechanical Turk comparing Crowdclass to the popular citizen science project Galaxy Zoo. We find that the scaffolding presentation of content enables learning of more challenging concepts. By understanding the relationship between user motivation, learning, and performance, we draw general design principles for learning-as-an-incentive interventions applicable to other crowdsourcing applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.