2010
DOI: 10.1007/s10515-010-0070-z
|View full text |Cite
|
Sign up to set email alerts
|

Stable rankings for different effort models

Abstract: There exists a large and growing number of proposed estimation methods but little conclusive evidence ranking one method over another. Prior effort estimation studies suffered from "conclusion instability", where the rankings offered to different methods were not stable across (a) different evaluation criteria; (b) different data sources; or (c) different random selections of that data. This paper reports a study of 158 effort estimation methods on data sets based on COCOMO features. Four "best" methods were d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
34
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 40 publications
(35 citation statements)
references
References 41 publications
(48 reference statements)
1
34
0
Order By: Relevance
“…Recall the results of Shepperd et al [25] and Menzies et al [26]: different experimental conditions can change the rank of an effort estimator. Hence, it is important to study not just the rank of an estimator, but also how well that method performs across multiple experimental conditions such as: As to comparison summaries, the following procedure was repeated for each error measure.…”
Section: Experimental Conditionssupporting
confidence: 53%
See 1 more Smart Citation
“…Recall the results of Shepperd et al [25] and Menzies et al [26]: different experimental conditions can change the rank of an effort estimator. Hence, it is important to study not just the rank of an estimator, but also how well that method performs across multiple experimental conditions such as: As to comparison summaries, the following procedure was repeated for each error measure.…”
Section: Experimental Conditionssupporting
confidence: 53%
“…Recently, researchers have access to more methods than that used by Shepperd et al For example, Menzies et al [26] studied 158 methods. While that study was over a very limited data set (just two old COCOMO data sets), their preliminary results prompted this study (where we work with 20 data sets).…”
Section: Ranking Instabilitymentioning
confidence: 99%
“…Following the suggestions by Shepperd and Kadoda [15], and Menzies et al [13], we use a large set of data (9 datasets) in our experiment, and use more robust evaluation criteria (5 error measures) [4] subject to Wilcoxon rank-sum statistical test. This evaluation procedure is aimed at providing a stable conclusion in this comparison-based study.…”
Section: Discussionmentioning
confidence: 99%
“…The results from many empirical studies [1,2,9,10] endorsed the effectiveness of dynamic-k over fixed-k methods, but some of which produced conflicting results [10]. This therefore introduces the unstable conclusion, an issue widely recognized in software effort estimation where different studies in the same area provide greatly diversified conclusions [6,13].…”
Section: Introductionmentioning
confidence: 99%
“…and we used a fixed, average fixing time value for each defect. Omitting the severity measure in defect prediction studies is a frequent practice [23], however, it can be important when simulated QA cost calculation will be compared to real-life 27 Cost Effectiveness of Software Defect ... Figure 5: Costs of quality assurance in considered project: actual compared to simulated, when DePress-based defect prediction is used values. In our simulation we assumed equal severity for each defect, which is reflected in an equal, average cost (see Table 4).…”
Section: Threats To Validitymentioning
confidence: 99%