Background: Understanding maintenance activities performed in a source code repository could help practitioners reduce uncertainty and improve cost-effectiveness by planning ahead and pre-allocating resources towards source code maintenance. The research community uses 3 main classification categories for maintenance activities: Corrective: fault fixing; Perfective: system improvements; Adaptive: new feature introduction. Previous work in this area has mostly concentrated on evaluating commit classification (into maintenance activities) models in the scope of a single software project. Aims: In this work we seek to design a commit classification model capable of providing high accuracy and Kappa across different projects. In addition, we wish to compare the accuracy and kappa characteristics of classification models that utilize word frequency analysis, source code changes, and combination thereof. Method: We suggest a novel method for automatically classifying commits into maintenance activities by utilizing source code changes (e.g, statement added, method removed, etc.). The results we report are based on studying 11 popular open source projects from various professional domains from which we had manually classified 1151 commits, over 100 from each of the studied projects. Our models were trained using 85% of the dataset, while the remaining 15% were used as a test set. Results: Our method shows a promising accuracy of 76% and Cohen's kappa of 63% (considered "Good" in this context) for the test dataset, an improvement of over 20 percentage points, and a relative boost of ∼40% in the context of cross-project classification. Conclusions: We show that by using source code changes in combination with commit message word frequency analysis we are able to considerably boost classification quality in a project agnostic manner.
Despite vast interest in design patterns, the specification and application of patterns is generally assumed to rely on manual implementation. We describe a precise method of specibing how a design pattern is applied: by phrasing it as an algorithm in a meta-programming language. We present a prototype of a tool that supports the specification of design patterns and their realization in a given program. Our prototype allows automatic application of design patterns without obstructing the source code text from the programmer, whom may edit it at will. We demonstrate pattern specification in meta-programming techniques and a sample outcome of its application.
Abstract-Predictive models for software projects' characteristics have been traditionally based on project-level metrics, employing only little developer-level information, or none at all. In this work we suggest novel metrics that capture temporal and semantic developer-level information collected on a per developer basis. To address the scalability challenges involved in computing these metrics for each and every developer for a large number of source code repositories, we have built a designated repository mining platform. This platform was used to create a metrics dataset based on processing nearly 1000 highly popular open source GitHub repositories, consisting of 147 million LOC, and maintained by 30,000 developers. The computed metrics were then employed to predict the corrective, perfective, and adaptive maintenance activity profiles identified in previous works. Our results show both strong correlation and promising predictive power with R 2 values of 0.83, 0.64, and 0.75. We also show how these results may help project managers to detect anomalies in the development process and to build better development teams. In addition, the platform we built has the potential to yield further predictive models leveraging developer-level metrics at scale.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.