Documentation Matters: Human-Centered AI System to Assist Data Science Code Documentation in Computational Notebooks

Wang, April Yi; Wang, Dakuo; Drozdal, Jaimie; Müller, Michael; Park, Soya; Weisz, Justin D.; Liu, Xuye; Wu, Lingfei; Dugan, Casey

doi:10.1145/3489465

Cited by 49 publications

(21 citation statements)

References 76 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Generative models based on the transformer architecture [96] have recently been applied to the domain of software engineering. Code-fluent large language models are capable of generating code from natural language descriptions [105], translating code from one language to another [75], generating unit tests [92], and even generating documentation for code [36,38,97,98]. These models are probabilistic systems, and as such, do not always produce perfect results (e.g.…”

Section: Code-fluent Foundation Models and Human-centered Evaluations...mentioning

confidence: 99%

“…Recently, models leveraging the transformer architecture [96] have been developed to perform domain-specific software engineering tasks, such as translating code between languages [75], generating documentation for code [36,38,97,98], and generating unit tests for code [92] (see Talamadupula [90] and Allamanis et al [5] for surveys). Recently developed foundation models -large language models that can be adapted to multiple tasks and which exhibit emergent behaviors for which they have not been explicitly trained [14] -have also proven to be capable with source code.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

The Programmer's Assistant: Conversational Interaction with a Large Language Model for Software Development

Ross,

Martinez,

Houde

et al. 2023

Preprint

View full text Add to dashboard Cite

Large language models (LLMs) have recently been applied in software engineering to perform tasks such as translating code between programming languages, generating code from natural language, and autocompleting code as it is being written. When used within development tools, these systems typically treat each model invocation independently from all previous invocations, and only a specific limited functionality is exposed within the user interface. This approach to user interaction misses an opportunity for users to more deeply engage with the model by having the context of their previous interactions, as well as the context of their code, inform the model's responses. We developed a prototype system -the Programmer's Assistant -in order to explore the utility of conversational interactions grounded in code, as well as software engineers' receptiveness to the idea of conversing with, rather than invoking, a code-fluent LLM. Through an evaluation with 42 participants with varied levels of programming experience, we found that our system was capable of conducting extended, multi-turn discussions, and that it enabled additional knowledge and capabilities beyond code generation to emerge from the LLM. Despite skeptical initial expectations for conversational programming assistance, participants were impressed by the breadth of the assistant's capabilities, the quality of its responses, and its potential for improving their productivity. Our work demonstrates the unique potential of conversational interactions with LLMs for co-creative processes like software development.CCS Concepts: • Human-centered computing → HCI theory, concepts and models; • Software and its engineering → Designing software; • Computing methodologies → Generative and developmental approaches.

show abstract

Section: Code-fluent Foundation Models and Human-centered Evaluations...mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

The Programmer's Assistant: Conversational Interaction with a Large Language Model for Software Development

Ross,

Martinez,

Houde

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…These findings motivate us to consider AI automation as a potential solution to support the tedious process of crafting documentation. Thus, we proposed Themisto, an automated code documentation generation system that integrates into the Jupyter Notebook environment [10]. We found that automated documentation generation techniques reduced the time for writing documentation, reminded participants to document code that they would have ignored, and improved their satisfaction with the final notebook.…”

Section: B Themisto: Human-centered Ai System To Assist Data Science ...mentioning

confidence: 99%

“…When data scientists handle off analysis work, it is critical to understand how the analysis code is changed. Our previous interactive tools [9], [10] demonstrate two different approaches to help data scientists make sense of the code evolvement. However, explaining code changes only tackles half of the problem.…”

Section: Ditlmentioning

confidence: 99%

“…for generating natural language explanations of the data science code [10]; 3) a concept and prototype of implementing dataframe visualizations as a first-class citizen in data science programming environments to make sense of the impact of code changes [11]. Lastly, I will also present an ongoing project on improving awareness and avoiding conflict editing in real-time collaborative notebooks.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Improving Real-Time Collaborative Data Science Through Context-Aware Mechanisms

Wang

2022

2022 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)

Self Cite

View full text Add to dashboard Cite

Data scientists benefit from collaboration: data scientists work across disciplines with a variety of stakeholders in practice [1]; data scientists collaborate between the team to improve work efficiency [2]; citizen data scientists collaborate in an open source manner to collectively explore topics of shared interests [3], [4].However, collaboration in data science is often hard. Since data science is highly exploratory [5], the artifact and analysis often iterate fast. It is difficult to maintain a shared understanding across various collaborators. On the other hand, tools like computational notebooks provide a convenient approach for data scientists to run, document, and share analysis in a storytelling way [6]. It satisfies the basic collaboration needs for data scientists to communicate and iterate on each other's work. However, such benefits are rudimentary. There are still many open-ended questions about how to improve the collaboration experience by designing better collaborative data science tools. For example, data scientists often neglect to keep updated documentation during rapid exploration, which results in computational notebooks that are messy and difficult to read [7]; without strategic planning, working together in a shared notebook may block each other's work.My research draws upon human-centered design techniques to identify barriers in real-world data science programming practices, and explore the design space of collaborative data science environments through tool-building. In this paper, I will first review our prior work on understanding how data scientists use computational notebooks for collaboration [8]. Through a mixed-method study, we found that working on the synchronous notebooks improves collaboration by creating a shared context, encouraging more exploration, and reducing communication costs. We also identified several challenges with the synchronous notebook editing tools such as producing messy and less organized notebooks, causing conflict editing without strategic planning.Inspired by the study results as well as related work, we then developed a series of prototypes that aim to help data scientists handle off work during collaboration: 1) a prototype that captures the contextual links between messages and notebook elements [9]; 2) a set of automatic approaches

show abstract

A Map of Exploring Human Interaction Patterns with LLM: Insights into Collaboration and Creativity

Li,

2024

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Documentation Matters: Human-Centered AI System to Assist Data Science Code Documentation in Computational Notebooks

Cited by 49 publications

References 76 publications

The Programmer's Assistant: Conversational Interaction with a Large Language Model for Software Development

The Programmer's Assistant: Conversational Interaction with a Large Language Model for Software Development

Improving Real-Time Collaborative Data Science Through Context-Aware Mechanisms

A Map of Exploring Human Interaction Patterns with LLM: Insights into Collaboration and Creativity

Contact Info

Product

Resources

About