Evaluating source code summarization techniques: Replication and expansion

Eddy, B. P.; Robinson, Jeffrey A.; Kraft, Nicholas A.; Carver, Jeffrey C.

doi:10.1109/icpc.2013.6613829

Cited by 109 publications

(76 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the approach by Haiduc et al, the summary of each method consists of the top-n keywords based on these tf/idf scores. An independent study carried out by Eddy et al has confirmed that these keywords can form an accurate summary of Java methods [18]. See Section 6.3 for an example of the output from this approach.…”

Section: Vector Space Model Summarizationmentioning

confidence: 96%

“…Earlier work includes approaches to explain failed tests [64], Java exceptions [11], change log messages [12], and systemic software evolution [29]. Studies of these techniques have shown that summarization is effective in comprehension [18] and traceability link recovery [3]. Nevertheless, no consensus has developed around what characterizes a "high quality" summary or what information should be included in these summaries.…”

Section: Source Code Summarizationmentioning

confidence: 99%

“…In one solution, Haiduc et al proposed to adapt ideas from text summarization, and developed a tool that creates summaries by treating source code as blocks of natural language text [26]. Moreno et al [42] and Eddy et al [18] have built on this approach and verified that it can extract keywords relevant to the source code being summarized. Still, a consistent theme across all three of these studies is that different terms are relevant for different reasons, and that additional studies are necessary to understand what programmers prioritize when summarizing code.…”

Section: The Problemmentioning

confidence: 99%

“…RQ1 To what degree do programmers focus on the keywords that the VSM tf/idf technique [26,18] extracts?…”

Section: Research Questionsmentioning

confidence: 99%

See 3 more Smart Citations

Improving automated source code summarization via an eye-tracking study of programmers

Rodeghero

McMillan

McBurney

et al. 2014

Proceedings of the 36th International Conference on Software Engineering

169

106

View full text Add to dashboard Cite

Source Code Summarization is an emerging technology for automatically generating brief descriptions of code. Current summarization techniques work by selecting a subset of the statements and keywords from the code, and then including information from those statements and keywords in the summary. The quality of the summary depends heavily on the process of selecting the subset: a high-quality selection would contain the same statements and keywords that a programmer would choose. Unfortunately, little evidence exists about the statements and keywords that programmers view as important when they summarize source code. In this paper, we present an eye-tracking study of 10 professional Java programmers in which the programmers read Java methods and wrote English summaries of those methods. We apply the findings to build a novel summarization tool. Then, we evaluate this tool and provide evidence to support the development of source code summarization systems.

show abstract

Section: Vector Space Model Summarizationmentioning

confidence: 96%

Section: Source Code Summarizationmentioning

confidence: 99%

Section: The Problemmentioning

confidence: 99%

“…RQ1 To what degree do programmers focus on the keywords that the VSM tf/idf technique [26,18] extracts?…”

Section: Research Questionsmentioning

confidence: 99%

See 2 more Smart Citations

Improving automated source code summarization via an eye-tracking study of programmers

Rodeghero

McMillan

McBurney

et al. 2014

Proceedings of the 36th International Conference on Software Engineering

169

106

View full text Add to dashboard Cite

show abstract

“…There are also other approaches that are based on modeling for automatic summarization of source code as in [10].…”

Section: Related Workmentioning

confidence: 99%

Summarizing Services of Java Packages

Hammad¹,

Abuljadayel²,

Khalaf³

2016

LNSE

View full text Add to dashboard Cite

Abstract-Program comprehension is essential for code maintenance and evolution activities. It saves time and efforts of developers who want to perform any code changes. It also minimizes the chances of introducing bugs. Textual summaries for source code provide great help to code understanding activities. This paper presents an approach to automatically generate textual summaries for services implemented in java packages. The summary is generated by analyzing the source code of methods defined the package. Each method represents a service provide by the package. Each service is summarized as a natural language textual description. The generated summary for a method mainly includes the used data and the names of invoked methods. Summaries of all methods defined in a package are refined and integrated to be reported as a comprehensive summary for the services provided by the package. The generated summaries are useful in different ways. They can be used by developers in their maintenance activities. They also can be useful for the documentation purposes. IndexTerms-Program comprehension, software maintenance, source code summarization.

show abstract

Impact of structural weighting on a latent Dirichlet allocation–based feature location technique

Eddy

Kraft

Gray

2017

J Software Evolu Process

Self Cite

View full text Add to dashboard Cite

Text retrieval-based feature location techniques (FLTs) use information from the terms present in documents in classes and methods. However, relevant terms originating from certain locations (eg, method names) often comprise only a small part of the entire method lexicon. Feature location techniques should benefit from techniques that make greater use of this information. The primary objective of this study was to investigate how weighting terms from different locations in source code can improve a latent Dirichlet allocation (LDA)-based FLT. We conducted an empirical study of 4 subject software systems and 372 features. For each subject system, we trained 1024 different LDA models with new weighting schemes applied to leading comments, method names, parameters, body comments, and local variables. We conducted both a quantitative and qualitative analysis to identify the effects of using the weighting schemes on the performance of the LDA-based FLT. We evaluated weighting schemes based on mean reciprocal rank and spread of effectiveness measures. In addition, we conducted a factorial analysis to identify which locations have a main impact on the results of the FLT. We then examined the effects of adding information from class comments, class names, and fields to the top 10 configurations for each system. This results in an additional 640 different LDA models for each system. From our results, we identified a significant effect in the performance of an LDA-based weighting configuration when applying our weighting schemes to the LDA-based FLT. Furthermore, we found that adding information from each method's containing class can improve the effectiveness of an LDA-based FLT. Finally, we identified a set of recommendations for identifying better weighting schemes for LDA. KEYWORDSfeature location, program comprehension, static analysis, term weighting, text retrieval INTRODUCTIONSoftware features are functionalities that are accessible to developers and users. During the evolution of a software system, developers change the source code to add new features, enhance existing features, and remove defective features (bugs). When developers are tasked with changing the source code of a large or unfamiliar system, they must spend considerable time and effort on program comprehension activities to gain the knowledge needed to implement changes. Part of this process is called feature location, a software evolution task in which a developer locates the source code entities (eg, methods and classes) that implement a functionality (feature). 1 During software evolution, developers must perform feature location to identify the entities needed to add new functionalities, modify existing functionalities, or remove defects.Given the scale of modern software systems, manual feature location is impractical. built from one of these techniques does not contain information regarding the originating element of a term (eg, method name, local variable, and parameter), but rather contains only the (preprocessed) terms. The impacts o...

show abstract

Evaluating source code summarization techniques: Replication and expansion

Cited by 109 publications

References 23 publications

Improving automated source code summarization via an eye-tracking study of programmers

Improving automated source code summarization via an eye-tracking study of programmers

Summarizing Services of Java Packages

Impact of structural weighting on a latent Dirichlet allocation–based feature location technique

Contact Info

Product

Resources

About