Abstract-Background: Despite decades of research, there is no consensus on which software effort estimation methods produce the most accurate models. Aim: Prior work has reported that, given M estimation methods, no single method consistently outperforms all others. Perhaps rather than recommending one estimation method as best, it is wiser to generate estimates from ensembles of multiple estimation methods. Method: 9 learners were combined with 10 pre-processing options to generate 9 × 10 = 90 solo-methods. These were applied to 20 data sets and evaluated using 7 error measures. This identified the best n (in our case n = 13) solo-methods that showed stable performance across multiple datasets and error measures. The top 2, 4, 8 and 13 solo-methods were then combined to generate 12 multi-methods, which were then compared to the solo-methods. Results: (i) The top 10 (out of 12) multi-methods significantly out-performed all 90 solo-methods. (ii) The error rates of the multimethods were significantly less than the solo-methods. (iii) The ranking of the best multi-method was remarkably stable. Conclusion: While there is no best single effort estimation method, there exist best combinations of such effort estimation methods.
Abstract-Background: There are too many design options for software effort estimators. How can we best explore them all? Aim: We seek aspects on general principles of effort estimation that can guide the design of effort estimators. Method: We identified the essential assumption of analogy-based effort estimation: i.e. the immediate neighbors of a project offer stable conclusions about that project. We test that assumption by generating a binary tree of clusters of effort data and comparing the variance of super-trees vs smaller sub-trees. Results: For ten data sets (from Coc81, Nasa93, Desharnais, Albrecht, ISBSG, and data from Turkish companies), we found: (a) the estimation variance of cluster sub-trees is usually larger than that of cluster super-trees; (b) if analogy is restricted to the cluster trees with lower variance then effort estimates have a significantly lower error (measured using MRE and a Wilcoxon test, 95% confidence, compared to nearest-neighbor methods that use neighborhoods of a fixed size). Conclusion: Estimation by analogy can be significantly improved by a dynamic selection of nearest neighbors, using only the project data from regions with small variance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.