Abstract-Effort estimation often requires generalizing from a small number of historical projects. Generalization from such limited experience is an inherently underconstrained problem. Hence, the learned effort models can exhibit large deviations that prevent standard statistical methods (e.g., t-tests) from distinguishing the performance of alternative effort-estimation methods. The COSEEKMO effort-modeling workbench applies a set of heuristic rejection rules to comparatively assess results from alternative models. Using these rules, and despite the presence of large deviations, COSEEKMO can rank alternative methods for generating effort models. Based on our experiments with COSEEKMO, we advise a new view on supposed "best practices" in model-based effort estimation: 1) Each such practice should be viewed as a candidate technique which may or may not be useful in a particular domain, and 2) tools like COSEEKMO should be used to help analysts explore and select the best method for a particular domain.
Context: More than half the literature on software effort estimation (SEE) focuses on comparisons of new estimation methods. Surprisingly, there are no studies comparing state of the art latest methods with decades-old approaches.Objective: To check if new SEE methods generated better estimates than older methods. Method: Firstly, collect effort estimation methods ranging from "classical" COCOMO (parametric estimation over a pre-determined set of attributes) to "modern" (reasoning via analogy using spectral-based clustering plus instance and feature selection, and a recent "baseline method" proposed in ACM Transactions on Software Engineering). Secondly, catalog the list of objections that lead to the development of post-COCOMO estimation methods. Thirdly, characterize each of those objections as a comparison between newer and older estimation methods. Fourthly, using four COCOMO-style data sets (from 1991, 2000, 2005, 2010) and run those comparisons experiments. Fifthly, compare the performance of the different estimators using a Scott-Knott procedure using (i) the A12 effect size to rule out "small" differences and (ii) a 99% confident bootstrap procedure to check for statistically different groupings of treatments).Results: The major negative results of this paper are that for the COCOMO data sets, nothing we studied did any better than Boehm's original procedure.Conclusions: When COCOMO-style attributes are available, we strongly recommend (i) using that data and (ii) use COCOMO to generate predictions. We say this since the experiments of this paper show that, at least for effort estimation, how data is collected is more important than what learner is applied to that data.
Adoption of advanced automated SE (ASE) tools would be more favored if a business case could be made that these tools are more valuable than alternate methods. In theory, software prediction models can be used to make that case. In practice, this is complicated by the "local tuning" problem. Normally, predictors for software effort and defects and threat use local data to tune their predictions. Such local tuning data is often unavailable.This paper shows that assessing the relative merits of different SE methods need not require precise local tunings. STAR1 is a simulated annealer plus a Bayesian post-processor that explores the space of possible local tunings within software prediction models. STAR1 ranks project decisions by their effects on effort and defects and threats. In experiments with NASA systems, STAR1 found one project where ASE were essential for minimizing effort/ defect/ threats; and another project were ASE tools were merely optional.
COCONUT calibrates effort estimation models using an exhaustive search over the space of calibration parameters in a COCOMO I model. This technique is much simpler than other effort estimation method yet yields PRED levels comparable to those other methods. Also, it does so with less project data and fewer attributes (no scale factors). However, a comparison between COCONUT and other methods is complicated by differences in the experimental methods used for effort estimation. A review of those experimental methods concludes that software effort estimation models should be calibrated to local data using incremental holdout (not jack knife) studies, combined with randomization and hypothesis testing, repeated a statistically significant number of times.
There exists a large and growing number of proposed estimation methods but little conclusive evidence ranking one method over another. Prior effort estimation studies suffered from "conclusion instability", where the rankings offered to different methods were not stable across (a) different evaluation criteria; (b) different data sources; or (c) different random selections of that data. This paper reports a study of 158 effort estimation methods on data sets based on COCOMO features. Four "best" methods were detected that were consistently better than the "rest" of the other 154 methods. These rankings of "best" and "rest" methods were stable across (a) three different evaluation criteria applied to (b) multiple data sets from two different sources Autom Softw Eng (2010) 17: 409-437 that were (c) divided into hundreds of randomly selected subsets using four different random seeds. Hence, while there exists no single universal "best" effort estimation method, there appears to exist a small number (four) of most useful methods. This result both complicates and simplifies effort estimation research. The complication is that any future effort estimation analysis should be preceded by a "selection study" that finds the best local estimator. However, the simplification is that such a study need not be labor intensive, at least for COCOMO style data sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.