To date many software engineering cost models have been developed to predict the cost, schedule and quality of the software under development. But, the rapidly changing nature of software development has made it extremely difficult to develop empirical models that continue to yield high prediction accuracies. Software development costs continue to increase and practitioners continually express their concerns over their inability to accurately predict the costs involved. Thus, one of the most important objectives of the software engineering community has been to develop useful models that constructively explain the software development life-cycle and accurately predict the cost of developing a software product. To that end, many parametric software estimation models have evolved in the last two decades [Putnam92, Jones97, Park88, Jensen83, Rubin83, Boehm81, Boehm95, Walkerden97, Conte86, Fenton91, Masters85, Mohanty81].Almost all of the above mentioned parametric models have been empirically calibrated to actual data from completed software projects. The most commonly used technique for empirical calibration has been the popular classical multiple regression approach. As discussed in this paper, the multiple regression approach imposes a few assumptions frequently violated by software engineering datasets. The source data is also generally imprecise in reporting size, effort and cost-driver ratings, particularly across different organizations. This results in the development of inaccurate empirical models that don't perform very well when used for prediction. This paper illustrates the problems faced by the multiple regression approach during the calibration of one of the popular software engineering cost models, COCOMO II. It describes the use of a pragmatic 10% weighted average approach that was used for the first publicly available calibrated version [Clark98]. It then moves on to show how a more sophisticated Bayesian approach can be used to alleviate some of the problems faced by multiple regression. It compares and contrasts the two empirical approaches, and concludes that the Bayesian approach was better and more robust than the multiple regression approach.Bayesian analysis is a well-defined and rigorous process of inductive reasoning that has been used in many scientific disciplines [the reader can refer to Gelman95, Zellner83, Box73 for a broader understanding of the Bayesian Analysis approach]. A distinctive feature of the Bayesian approach is that it permits the investigator to use both sample (data) and prior (expert-judgement) information in a logically consistent manner in making inferences. This is done by using Bayes' theorem to produce a 'post-data' or posterior distribution for the model parameters. Using Bayes' theorem, prior (or initial) values are transformed to post-data views. This transformation can be viewed as a learning process. The posterior distribution is determined by the variances of the prior and sample information. If the variance of the prior information is smaller than ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.