Exploiting the Essential Assumptions of Analogy-Based Effort Estimation

Kocagüneli, Ekrem; Menzies, Tim; Bener, Ayşe; Keung, Jacky

doi:10.1109/tse.2011.27

Cited by 181 publications

(164 citation statements)

References 53 publications

Supporting

Mentioning

163

Contrasting

Order By: Relevance

“…Therefore, the terms CC/WC should not be considered as synonyms of heterogeneous/homogeneous [30], and the possible heterogeneity of WC projects should be tackled. Menzies et al [21] and Minku and Yao [34] investigated the use of tree-based SEE models to tackle heterogeneity in general, i.e., not restricted to CC projects. Other local approaches such as k-nearest neighbours [2,40] could also be seen as tackling heterogeneity.…”

Section: Related Workmentioning

confidence: 99%

“…Kocaguneli et al [22] investigated a tree-based filtering mechanism called TEAK [21] to tackle heterogeneity. This mechanism creates trees to represent training projects and provide effort estimations.…”

Section: Related Workmentioning

confidence: 99%

“…This clustering method has been chosen because hierarchical SEE models have obtained promising results for tackling heterogeneity [21,34]. It also has the advantage of being a deterministic method, i.e., it will always retrieve the same clusters when the same projects and features are used.…”

Section: Clustering Dycommentioning

confidence: 99%

See 2 more Smart Citations

Clustering Dycom

Minku

Hou

2017

Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering

View full text Add to dashboard Cite

Background: Software Effort Estimation (SEE) can be formulated as an online learning problem, where new projects are completed over time and may become available for training. In this scenario, a Cross-Company (CC) SEE approach called Dycom can drastically reduce the number of Within-Company (WC) projects needed for training, saving the high cost of collecting such training projects. However, Dycom relies on splitting CC projects into different subsets in order to create its CC models. Such splitting can have a significant impact on Dycom's predictive performance. Aims: This paper investigates whether clustering methods can be used to help finding good CC splits for Dycom. Method: Dycom is extended to use clustering methods for creating the CC subsets. Three different clustering methods are investigated, namely Hierarchical Clustering, K-Means, and ExpectationMaximisation. Clustering Dycom is compared against the original Dycom with CC subsets of different sizes, based on four SEE databases. A baseline WC model is also included in the analysis. Results: Clustering Dycom with K-Means can potentially help to split the CC projects, managing to achieve similar or better predictive performance than Dycom. However, K-Means still requires the number of CC subsets to be pre-defined, and a poor choice can negatively affect predictive performance. EM enables Dycom to automatically set the number of CC subsets while still maintaining or improving predictive performance with respect to the baseline WC model. Clustering Dycom with Hierarchical Clustering did not offer significant advantage in terms of predictive performance. Conclusion: Clustering methods can be an effective way to automatically generate Dycom's CC subsets.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Clustering Dycom

Minku

Hou

2017

Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering

View full text Add to dashboard Cite

show abstract

“…A recent study by Kocaguneli et al [43] concluded that an estimation that relies on a smaller number of more relevant analogues will result in better estimation performance than relying on a larger number of less relevant analogues. Our results are in agreement with Kocaguneli et al as LSA-X achieved significantly improved estimation performance.…”

Section: And Why Does It Work?mentioning

confidence: 99%

LSA-X: Exploiting Productivity Factors in Linear Size Adaptation for Analogy-Based Software Effort Estimation

Phannachitta

Monden

Keung

et al. 2016

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYAnalogy-based software effort estimation has gained a considerable amount of attention in current research and practice. Its excellent estimation accuracy relies on its solution adaptation stage, where an effort estimate is produced from similar past projects. This study proposes a solution adaptation technique named LSA-X that introduces an approach to exploit the potential of productivity factors, i.e., project variables with a high correlation with software productivity, in the solution adaptation stage. The LSA-X technique tailors the exploitation of the productivity factors with a procedure based on the Linear Size Adaptation (LSA) technique. The results, based on 19 datasets show that in circumstances where a dataset exhibits a high correlation coefficient between productivity and a related factor (r ≥ 0.30), the proposed LSA-X technique statistically outperformed (95% confidence) the other 8 commonly used techniques compared in this study. In other circumstances, our results suggest using any linear adaptation technique based on software size to compensate for the limitations of the LSA-X technique.

show abstract

“…Data-intensive analogy based software effort prediction gained popularity in the late 1990's by Shepperd and Schofield. Recently, Kocaguneli et al (2011) proposed a method to improve Analogy based software estimation. Empirical experiments using the tools such as ESTOR and ANGEL (Keung, 2008) show that the estimation by analogy is a viable alternative to predict accuracy and flexibility.…”

Section: Introductionmentioning

confidence: 99%

Detection of Aberrant Data Points for an effective Effort Estimation using an Enhanced Algorithm with Adaptive Features

M.¹

2012

Journal of Computer Science

View full text Add to dashboard Cite

Problem statement:The spiraling growth of IT industry has witnessed an unprecedented change in the software development paradigm, from algorithmic models to machine learning techniques. At present, there are no standard methods to predict the accuracy of software cost estimation, which is an important goal of the software community. Approach: This study proposes a simple and systematic algorithmic procedure for analogy based software cost prediction to detect the aberrant data points. The algorithm is analyzed and correlated with the Desharnais and NASA datasets containing all adaptive features with numerical and categorical variables. Results: The interpreted curves using the above datasets depict a discernible anomaly for the dataset having more categorical variables, thereby indicating the erroneous data points. Conclusion: The elimination of aberrant data points using the new algorithmic method improves the accuracy of software cost estimation using historical data sets.

show abstract

Exploiting the Essential Assumptions of Analogy-Based Effort Estimation

Cited by 181 publications

References 53 publications

Clustering Dycom

Clustering Dycom

LSA-X: Exploiting Productivity Factors in Linear Size Adaptation for Analogy-Based Software Effort Estimation

Detection of Aberrant Data Points for an effective Effort Estimation using an Enhanced Algorithm with Adaptive Features

Contact Info

Product

Resources

About