Coherence of comments and method implementations: a dataset and an empirical investigation

Corazza, Anna; Maggio, Valerio; Scanniello, Giuseppe

doi:10.1007/s11219-016-9347-1

Cited by 33 publications

(34 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We compare the multi-task model and the single-task model on public datasets [1,2,5] of three downstream tasks. A single-task model is trained on the data of a single task using the same architecture.…”

Section: Preliminary Resultsmentioning

confidence: 99%

A multi-task representation learning approach for source code

Wang

Dong

2020

Proceedings of the 1st ACM SIGSOFT International Workshop on Representation Learning for Software Engineering and Program Langu

View full text Add to dashboard Cite

Representation learning has shown impressive results for a multitude of tasks in software engineering. However, most researches still focus on a single problem. As a result, the learned representations cannot be applied to other problems and lack generalizability and interpretability. In this paper, we propose a Multi-task learning approach for representation learning across multiple downstream tasks of software engineering. From the perspective of generalization, we build a shared sequence encoder with a pretrained BERT for the token sequence and a structure encoder with a Tree-LSTM for the abstract syntax tree of code. From the perspective of interpretability, we integrate attention mechanism to focus on different representations and set learnable parameters to adjust the relationship between tasks. We also present the early results of our model. The learning process analysis shows our model has a significant improvement over strong baselines. CCS CONCEPTS • Computing methodologies → Artificial intelligence; • Software and its engineering → Software organization and properties.

show abstract

Section: Preliminary Resultsmentioning

confidence: 99%

A multi-task representation learning approach for source code

Wang

Dong

2020

Proceedings of the 1st ACM SIGSOFT International Workshop on Representation Learning for Software Engineering and Program Langu

View full text Add to dashboard Cite

show abstract

“…This grading depends on several factors that affect the text content, such as word length, sentence length, word form, and syllables or letters. This formula produces scores determining the readability level of the text as shown in Table 1 below [17,21,26,31]. Table 1.…”

Section: Flesch Reading Easementioning

confidence: 99%

“…Table 1. Flesch reading ease score to assess the ease of readability in a document [31]. This is another formula of text readability measurement designed by Rudolph Flesch to use the same core measures (word length and sentence length) as Flesch reading ease but it uses different weighting factors.…”

Section: Flesch Reading Easementioning

confidence: 99%

“…This is another formula of text readability measurement designed by Rudolph Flesch to use the same core measures (word length and sentence length) as Flesch reading ease but it uses different weighting factors. The following is the algorithm to determine the Flesch-Kincaid grade level [31,32]. Table 2 shows the scores generated and the correspondent meaning.…”

Section: Flesch Reading Easementioning

confidence: 99%

See 1 more Smart Citation

Enhancing Software Comments Readability Using Flesch Reading Ease Score

2020

View full text Add to dashboard Cite

Comments are used to explain the meaning of code and ease communications between programmers themselves, quality assurance auditors, and code reviewers. A tool has been developed to help programmers write readable comments and measure their readability level. It is used to enhance software readability by providing alternatives to both keywords and comment statements from a local database and an online dictionary. It is also a word-finding query engine for developers. Readability level is measured using three different formulas: the fog index, the Flesch reading ease score, and Flesch–Kincaid grade levels. A questionnaire has been distributed to 42 programmers and 35 students to compare the readability aspect between both new comments written by the tool and the original comments written by previous programmers and developers. Programmers stated that the comments from the proposed tool had fewer complex words and took less time to read and understand. Nevertheless, this did not significantly affect the understandability of the text, as programmers normally have quite a high level of English. However, the results from students show that the tool affects the understandability of text and the time taken to read it, while text complexity results show that the tool makes new comment text that is more readable by changing the three studied variables.

show abstract

“…Corazza et al investigated several projects and devised an approach to detect the coherence between comments and a method's implementation using the Vector Space Model with tf-idf term weighting [8]. There are also proposals that focus on specific types of comments for detecting inconsistencies.…”

Section: Inconsistency Detectionmentioning

confidence: 99%

Detecting fragile comments

Ratol

Robillard

2017

2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)

View full text Add to dashboard Cite

The development lifecycle of a software system demands incessant improvements in the source code of a system to maintain its high quality with improved performance and code readability.Refactoring is a common software development practice that reshapes the internal structure and non-functional properties of a system without modifying its core functionality. Many simple refactorings like renaming code elements, extracting a snippet from large method to form new method etc. can be performed with the help of automatic tools. Renaming code elements like classes, interfaces or methods is a widely used refactoring activity. With tool support, rename refactorings can rely on the program structure to ensure correctness of the code transformation. Unfortunately, the textual references to the renamed identifier present in unstructured comment text cannot be formally detected through the syntax of the programming language. These textual references to the previous version of a renamed identifier pose threats to the consistency between code and comments, which leads to poor program comprehensibility. The comments containing such textual references become fragile with respect to the renamed program element and are referred to as fragile comments.This thesis proposes a new rule-based approach to detect and fix the fragile comments that result from renaming the identifiers. We implemented this approach for the Java programming language in the form of an Eclipse plug-in called Fraco. Fraco takes into account the type of an identifier, its morphology i.e. the part-of-speech tag and its inflectional form, its scope that defines its visibility in the source code and the location of comments in the source code with respect to the identifier.i We evaluated the performance of our technique, as implemented for Java in Fraco, by comparing its precision and recall against hand-annotated benchmarks created for both development and test sets each containing six target Java systems, and also compared the results against the performance of Eclipse's automated in-comment identifier replacement feature. Fraco performed with an average of 99% precision and recall on most components of both development and test data sets, and generally outperformed the baseline Eclipse feature. An average percentage of 25% of the total identifiers of category type and method in the data sets had fragile comments after renaming, which further motivates the need for research on automatic comment refactoring. ACKNOWLEDGMENTSThe completion of this thesis has been a great learning journey which has left me indebted to many people who have supported, contributed and guided me, many a times, by going out of their way in doing so.First and foremost, I would like to thank my supervisor Prof. Martin P. Robillard for his unconditional support, methodical guidance and utmost care towards the ideation, development and completion of this thesis. Being confidently able to finish this thesis, I respect and admire his patience in successfully training and repeatedly enc...

show abstract

Coherence of comments and method implementations: a dataset and an empirical investigation

Cited by 33 publications

References 29 publications

A multi-task representation learning approach for source code

A multi-task representation learning approach for source code

Enhancing Software Comments Readability Using Flesch Reading Ease Score

Detecting fragile comments

Contact Info

Product

Resources

About