How Data Scientists Use Computational Notebooks for Real-Time Collaboration

Wang, April Yi; Mittal, Atul K.; Brooks, Christopher; Oney, Steve

doi:10.1145/3359141

Cited by 103 publications

(75 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Computational notebooks are positioned as a potential solution to both support collaborative coding and communicating results to stakeholders [78]. However, a recent study reported reluctance for data scientists to directly communicate the in-progress model work in notebooks [65].…”

Section: Data Science Practices and Collaborationmentioning

confidence: 99%

See 1 more Smart Citation

Facilitating Knowledge Sharing from Domain Experts to Data Scientists for Building NLP Models

Park

Wang

Kawas³

et al. 2021

26th International Conference on Intelligent User Interfaces

Self Cite

View full text Add to dashboard Cite

Data scientists face a steep learning curve in understanding a new domain for which they want to build machine learning (ML) models. While input from domain experts could offer valuable help, such input is often limited, expensive, and generally not in a form readily consumable by a model development pipeline. In this paper, we propose Ziva, a framework to guide domain experts in sharing essential domain knowledge to data scientists for building NLP models. With Ziva, experts are able to distill and share their domain knowledge using domain concept extractors and five types of label justification over a representative data sample. The design of Ziva is informed by preliminary interviews with data scientists, in order to understand current practices of domain knowledge acquisition process for ML development projects. To assess our design, we run a mix-method case-study to evaluate how Ziva can facilitate interaction between domain experts and data scientists. Our results highlight that (1) domain experts are able to use Ziva to provide rich domain knowledge, while maintaining low mental load and stress levels; and (2) data scientists find Ziva's output helpful for learning essential information about the domain, offering scalability of information, and lowering the burden on domain experts to share knowledge. We conclude this work by experimenting with building NLP models using the Ziva output for our case study. CCS CONCEPTS• Human-centered computing → Empirical studies in HCI ; Interactive systems and tools.

show abstract

Section: Data Science Practices and Collaborationmentioning

confidence: 99%

“…For P2, domain experts gave an overview and touched on the basic concepts of each class. P3 pair-authored [78] with domain experts to bridge concepts and a mathematical formula that encapsulates the information. With this iterative learning process, data scientists were able to kick start model building.…”

Section: Limited Time and Limited Best Practicesmentioning

confidence: 99%

Facilitating Knowledge Sharing from Domain Experts to Data Scientists for Building NLP Models

Park

Wang

Kawas³

et al. 2021

26th International Conference on Intelligent User Interfaces

Self Cite

View full text Add to dashboard Cite

show abstract

“…These jumps cause changes in context, both in terms of the program state and analysts' mental models. The challenge of managing segments of analysis state is also faced in collaboration settings, where analysts sometimes jump through cells and need to understand cell dependencies [55]. Supporting analysts in navigating between segments of analysis in space and time poses additional challenges for the layout and temporal gaps.…”

Section: Non-linear Workflows and Asynchronous Collaborationsmentioning

confidence: 99%

B2

Hellerstein

Satyanarayan

2020

Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology

View full text Add to dashboard Cite

Data scientists have embraced computational notebooks to author analysis code and accompanying visualizations within a single document. Currently, although these media may be interleaved, they remain siloed: interactive visualizations must be manually specified as they are divorced from the analysis provenance expressed via dataframes, while code cells have no access to users' interactions with visualizations, and hence no way to operate on the results of interaction. To bridge this divide, we present B2, a set of techniques grounded in treating data queries as a shared representation between the code and interactive visualizations. B2 instruments data frames to track the queries expressed in code and synthesize corresponding visualizations. These visualizations are displayed in a dashboard to facilitate interactive analysis. When an interaction occurs, B2 reifies it as a data query and generates a history log in a new code cell. Subsequent cells can use this log to further analyze interaction results and, when marked as reactive, to ensure that code is automatically recomputed when new interaction occurs. In an evaluative study with data scientists, we find that B2 promotes a tighter feedback loop between coding and interacting with visualizations. All participants frequently moved from code to visualization and vice-versa, which facilitated their exploratory data analysis in the notebook.

show abstract

“…The challenge is that this process is labor intensive, requiring input from multiple specialists with different skill sets [1,31,34,47,48]. As a result, AI and Human-Computer Interaction (HCI) researchers have investigated how to design systems with features that support data scientists in creating machine learning models [24,28,34,41,42,44]. This Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.…”

Section: Introductionmentioning

confidence: 99%

“…Kross & Guo users expressed a strong desire for an integrated user interface with both code and narrative. These needs are perhaps best captured in the narrative uses [24] of the Jupyter Notebook environment [20,21], and researchers have conducted numerous studies of how data scientists incorporate notebooks into their workflows [38,41], how they conduct version control for notebooks [23], and how they enable simultaneous multi-user editing in notebooks [42].…”

Section: Introductionmentioning

confidence: 99%

AutoAIViz

Weidele¹,

Weisz²,

Oduor³

et al. 2020

Proceedings of the 25th International Conference on Intelligent User Interfaces

View full text Add to dashboard Cite

Artificial Intelligence (AI) can now automate the algorithm selection, feature engineering, and hyperparameter tuning steps in a machine learning workflow. Commonly known as AutoML or AutoAI, these technologies aim to relieve data scientists from the tedious manual work. However, today's AutoAI systems often present only limited to no information about the process of how they select and generate model results. Thus, users often do not understand the process, neither do they trust the outputs. In this short paper, we provide a first user evaluation by 10 data scientists of an experimental system, AutoAIViz, that aims to visualize AutoAI's model generation process. We find that the proposed system helps users to complete the data science tasks, and increases their understanding, toward the goal of increasing trust in the AutoAI system.

show abstract

How Data Scientists Use Computational Notebooks for Real-Time Collaboration

Cited by 103 publications

References 25 publications

Facilitating Knowledge Sharing from Domain Experts to Data Scientists for Building NLP Models

Facilitating Knowledge Sharing from Domain Experts to Data Scientists for Building NLP Models

B2

AutoAIViz

Contact Info

Product

Resources

About