Towards building a universal defect prediction model

Zhang, Feng; Mockus, Audris; Keivanloo, Iman; Zou, Ying

doi:10.1145/2597073.2597078

Cited by 116 publications

(72 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…For example, Turhan et al [26] applied a nearest neighbor filtering technique to filter out those irrelevant project data in the setting of CPDP, leading to a better prediction performance. More discusses on the comparison between WPDP and CPDP please refer to [4,10,27,28]. Unfortunately, very few prior studies paid attention to the issue in question in CPDP settings.…”

Section: Related Workmentioning

confidence: 99%

“…So, we should take into consideration various factors (rather than just accuracy) when applying them to different types of actual projects with limited resources, which is required to make an optimal (or near-optimal) tradeoff among generality, performance and construction cost. That is, we want to find one or more appropriate regression models that can be used in different scenarios, because the previous studies about defect-proneness prediction have showed that the classifiers which are simple and easy to use tend to perform well in both within-and cross-project scenarios [10,27]. In particular, is this still practicable for defect numbers prediction?…”

Section: Research Questionsmentioning

confidence: 99%

See 1 more Smart Citation

An empirical study on predicting defect numbers

Chen

2015

International Conferences on Software Engineering and Knowledge Engineering

View full text Add to dashboard Cite

Abstract-Defect prediction is an important activity to make software testing processes more targeted and efficient. Many methods have been proposed to predict the defect-proneness of software components using supervised classification techniques in within-and cross-project scenarios. However, very few prior studies address the above issue from the perspective of predictive analytics. How to make an appropriate decision among different prediction approaches in a given scenario remains unclear. In this paper, we empirically investigate the feasibility of defect numbers prediction with typical regression models in different scenarios. The experiments on six open-source software projects in PROMISE repository show that the prediction model built with Decision Tree Regression seems to be the best estimator in both of the scenarios, and that for all the prediction models, the results yielded in the cross-project scenario can be comparable to (or sometimes better than) those in the within-project scenario when choosing suitable training data. Therefore, the findings provide a useful insight into defect numbers prediction for those new and inactive projects.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Research Questionsmentioning

confidence: 99%

An empirical study on predicting defect numbers

Chen

2015

International Conferences on Software Engineering and Knowledge Engineering

View full text Add to dashboard Cite

show abstract

“…• Remove code comments that contain any of the commonly used terms in defect prediction [20]. bug, fix, error, issue, crash, problem, fail, defect, patch …”

Section: Comment Selectionmentioning

confidence: 99%

CloCom: Mining existing source code for automatic comment generation

Wong

Liu

Tan

2015

2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)

136

View full text Add to dashboard Cite

Abstract-Code comments are an integral part of software development. They improve program comprehension and software maintainability. The lack of code comments is a common problem in the software industry. Therefore, it is beneficial to generate code comments automatically. In this paper, we propose a general approach to generate code comments automatically by analyzing existing software repositories. We apply code clone detection techniques to discover similar code segments and use the comments from some code segments to describe the other similar code segments. We leverage natural language processing techniques to select relevant comment sentences.In our evaluation, we analyze 42 million lines of code from 1,005 open source projects from GitHub, and use them to generate 359 code comments for 21 Java projects. We manually evaluate the generated code comments and find that only 23.7% of the generated code comments are good. We report to the developers the good code comments, whose code segments do not have an existing code comment. Amongst the reported code comments, seven have been confirmed by the developers as good and committable to the software repository while the rest await for developers' confirmation. Although our approach can generate good and committable comments, we still have to improve the yield and accuracy of the proposed approach before it can be used in practice with full automation.

show abstract

“…In the area of defect prediction for quality improvement, Peters et al [5] introduce guidelines to be used in the building of software quality predictors in case of scarcity of data while D'Ambros et al present a comparison between the different prediction approaches [6]. Zhang et al Platform to obtain an objective value of the software development process quality present in [7] a study for the specification of a universal defect predictor. Gamalielsson et al [8] define the health of an open source ecosystem as an important decision factor when considering the adoption of an OSS component.…”

Section: Table 1: European Projects Focusing On Oss Data Analysismentioning

confidence: 99%

The RISCOSS Platform for Risk Management in Open Source Software Adoption

Franch

Kenett²,

Mancinelli³

et al. 2015

Open Source Systems: Adoption and Impact

View full text Add to dashboard Cite

Abstract. Managing risks related to OSS adoption is a must for organizations that need to smoothly integrate OSS-related practices in their development processes. Adequate tool support may pave the road to effective risk management and ensure the sustainability of such activity. In this paper, we present the RISCOSS platform for managing risks in OSS adoption. RISCOSS builds upon a highly configurable data model that allows customization to several types of scopes. It implements two different working modes: exploration, where the impact of decisions may be assessed before making them; and continuous assessment, where risk variables (and their possible consequences on business goals) are continuously monitored and reported to decision-makers. The blackboard-oriented architecture of the platform defines several interfaces for the identified techniques, allowing new techniques to be plugged in.

show abstract

Towards building a universal defect prediction model

Cited by 116 publications

References 41 publications

An empirical study on predicting defect numbers

An empirical study on predicting defect numbers

CloCom: Mining existing source code for automatic comment generation

The RISCOSS Platform for Risk Management in Open Source Software Adoption

Contact Info

Product

Resources

About