Using Temporal and Semantic Developer-Level Information to Predict Maintenance Activity Profiles

Levin, Stanislav; Yehudai, Amiram

doi:10.1109/icsme.2016.21

Cited by 15 publications

(31 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Its Kappa on the other hand, would be 0, making this model much less appealing. (3) Our previous work [5] shows that source code change types as defined by Fluri et al [11] are statistically significant in the context of maintenance activity categories defined by Mockus et al [1]. We believe that boosting (i.e.…”

Section: Introductionmentioning

confidence: 90%

“…First we classified the test dataset (the 15% of the entire labeled dataset) using a naive method to set an initial baseline. The naive method is based solely on searching for pre-defined words gathered from previous work [5], and returning the most frequent class (i.e., corrective) in case none of the keywords were present in a commit's message, see table 2 for more details. The results showed that 34.8% of the commits in the test dataset (60 commits) did not have any of the keywords present in their commit message, and were therefore automatically classified corrective.…”

Section: Utilizing Word Frequency Analysismentioning

confidence: 99%

“…Understanding these maintenance activities, performed in a source code repository, could help practitioners reduce uncertainty and improve cost-effectiveness [2] by planning ahead and preallocating resources towards source code maintenance. Maintenance activity profiles of software projects have therefore been a subject of research in numerous works [1][2][3][4][5][6]. To determine maintenance activity profiles, one must first classify the activities, which come in the form of developer commits to the version control system (VCS).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Boosting Automatic Commit Classification Into Maintenance Activities By Utilizing Source Code Changes

Levin

Yehudai

2017

Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering

Self Cite

View full text Add to dashboard Cite

Background: Understanding maintenance activities performed in a source code repository could help practitioners reduce uncertainty and improve cost-effectiveness by planning ahead and pre-allocating resources towards source code maintenance. The research community uses 3 main classification categories for maintenance activities: Corrective: fault fixing; Perfective: system improvements; Adaptive: new feature introduction. Previous work in this area has mostly concentrated on evaluating commit classification (into maintenance activities) models in the scope of a single software project. Aims: In this work we seek to design a commit classification model capable of providing high accuracy and Kappa across different projects. In addition, we wish to compare the accuracy and kappa characteristics of classification models that utilize word frequency analysis, source code changes, and combination thereof. Method: We suggest a novel method for automatically classifying commits into maintenance activities by utilizing source code changes (e.g, statement added, method removed, etc.). The results we report are based on studying 11 popular open source projects from various professional domains from which we had manually classified 1151 commits, over 100 from each of the studied projects. Our models were trained using 85% of the dataset, while the remaining 15% were used as a test set. Results: Our method shows a promising accuracy of 76% and Cohen's kappa of 63% (considered "Good" in this context) for the test dataset, an improvement of over 20 percentage points, and a relative boost of ∼40% in the context of cross-project classification. Conclusions: We show that by using source code changes in combination with commit message word frequency analysis we are able to considerably boost classification quality in a project agnostic manner.

show abstract

Section: Introductionmentioning

confidence: 90%

Section: Utilizing Word Frequency Analysismentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Boosting Automatic Commit Classification Into Maintenance Activities By Utilizing Source Code Changes

Levin

Yehudai

2017

Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering

Self Cite

View full text Add to dashboard Cite

show abstract

“…In the course of our studies [6][7][8], the data processing stage typically included the following aggregations: commit level; developer level; project level; global statistics. The analytical layer we present allows researchers to produce commit level aggregations (see Listing 5) and obtain statistics such as: change type frequencies, number of test case (test method) addition/removal/modification, number of test suite (test class) addition/removal/modification, associated ticket id, number of test files, and non test files in a given commit.…”

Section: Obtaining Fine Grained Source Code Changesmentioning

confidence: 99%

“…To effectively process large datasets, our analytical layer leverages Apache Spark [12] (henceforth Spark), a widely popular distributed computation engine. The analytical layer we suggest has been successfully used to conduct a number of studies in the field of software maintenance and evolution [6][7][8]. This leads us to believe it can be useful for researchers conducting studies that involve fine-grained source code changes.…”

Section: Introductionmentioning

confidence: 99%

Processing Large Datasets of Fined Grained Source Code Changes

Levin

Yehudai

2019

2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Self Cite

View full text Add to dashboard Cite

In the era of Big Code, when researchers seek to study an increasingly large number of repositories to support their findings, the data processing stage may require manipulating millions and more of records.In this work we focus on studies involving fine-grained AST level source code changes. We present how we extended the CodeDistillery source code mining framework with data manipulation capabilities, aimed to alleviate the processing of large datasets of fine grained source code changes. The capabilities we have introduced allow researchers to highly automate their repository mining process and streamline the data acquisition and processing phases. These capabilities have been successfully used to conduct a number of studies, in the course of which dozens of millions of fine-grained source code changes have been processed.

show abstract