An empirical study of the textual similarity between source code and source code summaries

McBurney, Paul W.; McMillan, Collin

doi:10.1007/s10664-014-9344-6

Cited by 31 publications

(8 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Because of how LSS compares two bodies of text, large sentences often get artificially inflated, as a large number of words means at least one word is more likely to be semantically similar. We encountered this in previous work . When this did occur, conciseness scores and accuracy scores were usually lower.…”

Section: Discussionmentioning

confidence: 85%

Automated feature discovery via sentence selection and source code summarization

McBurney

Liu

McMillan

2016

J. Softw. Evol. and Proc.

Self Cite

View full text Add to dashboard Cite

Programs are, in essence, a collection of implemented features. Feature discovery in software engineering is the task of identifying key functionalities that a program implements. Manual feature discovery can be time consuming and expensive, leading to automatic feature discovery tools being developed. However, these approaches typically only describe features using lists of keywords, which can be difficult for readers who are not already familiar with the source code. An alternative to keyword lists is sentence selection, in which one sentence is chosen from among the sentences in a text document to describe that document. Sentence selection has been widely studied in the context of natural language summarization but is only beginning to be explored as a solution to feature discovery. In this paper, we compare four sentence selection strategies for the purpose of feature discovery. Two are off-the-shelf approaches, while two are adaptations we propose. We present our findings as guidelines and recommendations to designers of feature discovery tools.

show abstract

Section: Discussionmentioning

confidence: 85%

Automated feature discovery via sentence selection and source code summarization

McBurney

Liu

McMillan

2016

J. Softw. Evol. and Proc.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Therefore, there is an urgent need to generate a short description for the code to describe the code function accurately and effectively avoid errors caused by differences in conceptual understanding between maintainers and developers. [28].…”

Section: Problem Statementmentioning

confidence: 99%

A Multi-Module Based Method for Generating Natural Language Descriptions of Code Fragments

Gao

Jiang

et al. 2021

IEEE Access

View full text Add to dashboard Cite

Code fragment natural language description generation, also known as code summarization, refers to obtaining a natural language sequence describing a given code fragment's functionality. It is broadly agreed that applying code summarization into production can significantly improve the efficiency of software development and maintenance. In recent years, syntactic analysis (SA) technology and Latent Dirichlet Allocation (LDA) has been widely used in code summarization and has achieved good results. However, most of the existing techniques focus on core code statements, and thus their generated code summarization lacks a logical description of the code fragment's holistic information. To this end, we propose a code summarization method based on multiple modules to generate natural language for each code statement by constructing a new type of natural language template. Meanwhile, to utilize the code fragment's holistic information, we adopt the code statement partition rules and cosine similarity measure to rank and optimize the weight of the overall information of the code fragment, and finally generate the holistic natural language description of the code fragment. The experimental results demonstrate that our method can generate more concise and logical natural language descriptions than existing models.

show abstract

“…McBurney and McMillan propose generating docu-mentation summaries for Java methods using the call graph [7]. Furthermore, they propose an approach to evaluate a summary using textual similarity of that summary to the source code [42]. Haiduc et al [8] investigate the suitability of several text summarization techniques to automatically generate termbased summaries for methods and classes.…”

Section: Related Workmentioning

confidence: 99%

Developer Reading Behavior While Summarizing Java Methods: Size and Context Matters

Abid

Sharif

Dragan

et al. 2019

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

View full text Add to dashboard Cite

An eye-tracking study of 18 developers reading and summarizing Java methods is presented. The developers provide a written summary for methods assigned to them. In total, 63 methods are used from five different systems. Previous studies on this topic use only short methods presented in isolation usually as images. In contrast, this work presents the study in the Eclipse IDE allowing access to all the source code in the system. The developer can navigate via scrolling and switching files while writing the summary. New eye-tracking infrastructure allows for this improvement in the study environment. Data collected includes eye gazes on source code, written summaries, and time to complete each summary. Unlike prior work that concluded developers focus on the signature the most, these results indicate that they tend to focus on the method body more than the signature. Moreover, both experts and novices tend to revisit control flow terms rather than reading them for a long period. They also spend a significant amount of gaze time and have higher gaze visits when they read call terms. Experts tend to revisit the body of the method significantly more frequently than its signature as the size of the method increases. Moreover, experts tend to write their summaries from source code lines that they read the most.

show abstract

An empirical study of the textual similarity between source code and source code summaries

Cited by 31 publications

References 37 publications

Automated feature discovery via sentence selection and source code summarization

Automated feature discovery via sentence selection and source code summarization

A Multi-Module Based Method for Generating Natural Language Descriptions of Code Fragments

Developer Reading Behavior While Summarizing Java Methods: Size and Context Matters

Contact Info

Product

Resources

About