Proceedings of the 4th International Workshop on Predictor Models in Software Engineering 2008
DOI: 10.1145/1370788.1370801
|View full text |Cite
|
Sign up to set email alerts
|

Implications of ceiling effects in defect predictors

Abstract: Context: There are many methods that input static code features and output a predictor for faulty code modules. These data mining methods have hit a "performance ceiling"; i.e., some inherent upper bound on the amount of information offered by, say, static code features when identifying modules which contain faults.Objective: We seek an explanation for this ceiling effect. Perhaps static code features have "limited information content"; i.e. their information can be quickly and completely discovered by even si… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

3
113
1
1

Year Published

2012
2012
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 154 publications
(118 citation statements)
references
References 55 publications
3
113
1
1
Order By: Relevance
“…In other words, the number of defective files is far less than the number of defect-free files. Therefore, we use the under-sampling method, which is the most suitable sampling method for our datasets [16]. The pseudocode of the prediction model is given in Figure 3.…”
Section: Construction Of the Prediction Modelmentioning
confidence: 99%
See 2 more Smart Citations
“…In other words, the number of defective files is far less than the number of defect-free files. Therefore, we use the under-sampling method, which is the most suitable sampling method for our datasets [16]. The pseudocode of the prediction model is given in Figure 3.…”
Section: Construction Of the Prediction Modelmentioning
confidence: 99%
“…As a result, both increasing the efficiency of the software testing phase and delivering the software product to the market on time become possible. Reported results in software defect prediction literature suggest that further progress in defect prediction performance can be achieved by increasing the content of input data that defect predictors learn rather than using different algorithms or increasing the size of input data [17], [15], [16]. We can group some significant work in the literature in terms of their focus: algorithm driven approaches; data size driven approaches; and data content driven approaches.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…The question is whether there remains additional information in the data set that might be exploited to improve performance. While this is still an open question, and one to which we make some contribution through the present study, there is a view that a performance ceiling has been reached, and that the way forward lies in enriching the data with new information beyond existing metrics [12]. Nevertheless, the NASA data sets are freely available and remain attractive targets for researchers.…”
Section: Introductionmentioning
confidence: 98%
“…Although unsupervised learning has been applied, they also have unstable performance [8], [9]. Different from unsupervised learning, the active learning reduces the number of labeled instances required to achieve a stable performance in the majority of reported results [10].…”
Section: Introductionmentioning
confidence: 99%