Several research tools and projects require groups of similar code changes as input. Examples are recommendation and bug finding tools that can provide valuable information to developers based on such data. With the help of similar code changes they can simplify the application of bug fixes and code changes to multiple locations in a project. But despite their benefit, the practical value of existing tools is limited, as users need to manually specify the input data, i.e., the groups of similar code changes.To overcome this drawback, this paper presents and evaluates two syntactical similarity metrics, one of them is specifically designed to run fast, in combination with two carefully selected and self-tuning clustering algorithms to automatically detect groups of similar code changes.We evaluate the combinations of metrics and clustering algorithms by applying them to several open source projects and also publish the detected groups of similar code changes online as a reference dataset. The automatically detected groups of similar code changes work well when used as input for LASE, a recommendation system for code changes.
During the life span of large software projects, developers often apply the same code changes to different code locations in slight variations. Since the application of these changes to all locations is time-consuming and error-prone, tools exist that learn change patterns from input examples, search for possible pattern applications, and generate corresponding recommendations. In many cases, the generated recommendations are syntactically or semantically wrong due to code movements in the input examples. Thus, they are of low accuracy and developers cannot directly copy them into their projects without adjustments.We present the Accurate REcommendation System (ARES) that achieves a higher accuracy than other tools because its algorithms take care of code movements when creating patterns and recommendations. On average, the recommendations by ARES have an accuracy of 96% with respect to code changes that developers have manually performed in commits of source code archives. At the same time ARES achieves precision and recall values that are on par with other tools.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.