2022
DOI: 10.1145/3489465
|View full text |Cite
|
Sign up to set email alerts
|

Documentation Matters: Human-Centered AI System to Assist Data Science Code Documentation in Computational Notebooks

Abstract: Computational notebooks allow data scientists to express their ideas through a combination of code and documentation. However, data scientists often pay attention only to the code, and neglect creating or updating their documentation during quick iterations. Inspired by human documentation practices learned from 80 highly-voted Kaggle notebooks, we design and implement Themisto, an automated documentation generation system to explore how human-centered AI systems can support human data scientists in the machin… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
21
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 49 publications
(21 citation statements)
references
References 76 publications
0
21
0
Order By: Relevance
“…Generative models based on the transformer architecture [96] have recently been applied to the domain of software engineering. Code-fluent large language models are capable of generating code from natural language descriptions [105], translating code from one language to another [75], generating unit tests [92], and even generating documentation for code [36,38,97,98]. These models are probabilistic systems, and as such, do not always produce perfect results (e.g.…”
Section: Code-fluent Foundation Models and Human-centered Evaluations...mentioning
confidence: 99%
See 1 more Smart Citation
“…Generative models based on the transformer architecture [96] have recently been applied to the domain of software engineering. Code-fluent large language models are capable of generating code from natural language descriptions [105], translating code from one language to another [75], generating unit tests [92], and even generating documentation for code [36,38,97,98]. These models are probabilistic systems, and as such, do not always produce perfect results (e.g.…”
Section: Code-fluent Foundation Models and Human-centered Evaluations...mentioning
confidence: 99%
“…Recently, models leveraging the transformer architecture [96] have been developed to perform domain-specific software engineering tasks, such as translating code between languages [75], generating documentation for code [36,38,97,98], and generating unit tests for code [92] (see Talamadupula [90] and Allamanis et al [5] for surveys). Recently developed foundation models -large language models that can be adapted to multiple tasks and which exhibit emergent behaviors for which they have not been explicitly trained [14] -have also proven to be capable with source code.…”
Section: Introductionmentioning
confidence: 99%
“…These findings motivate us to consider AI automation as a potential solution to support the tedious process of crafting documentation. Thus, we proposed Themisto, an automated code documentation generation system that integrates into the Jupyter Notebook environment [10]. We found that automated documentation generation techniques reduced the time for writing documentation, reminded participants to document code that they would have ignored, and improved their satisfaction with the final notebook.…”
Section: B Themisto: Human-centered Ai System To Assist Data Science ...mentioning
confidence: 99%
“…When data scientists handle off analysis work, it is critical to understand how the analysis code is changed. Our previous interactive tools [9], [10] demonstrate two different approaches to help data scientists make sense of the code evolvement. However, explaining code changes only tackles half of the problem.…”
Section: Ditlmentioning
confidence: 99%
See 1 more Smart Citation