Research patterns and trends in software effort estimation

Sehra, Sumeet Kaur; Brar, Yadwinder Singh; Kaur, Navdeep; Sehra, Sukhjit Singh

doi:10.1016/j.infsof.2017.06.002

Cited by 80 publications

(51 citation statements)

References 181 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is a challenging and substantial activity when managing a software project. The challenge arises due to the complex relationship between effort and various software attributes related to the personal, product, and/or platforms used in the project [1], [2]. S Machine learning (ML) based estimation techniques are gaining increasing attention in SDEE research, as they can model the complex relationship between effort and software attributes (cost drivers), especially when this relationship is not linear and does not seem to have any predetermined form [2].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Improved Analogy-based Effort Estimation with Incomplete Mixed Data

Abnane¹,

Idri²

2018

Annals of Computer Science and Information Systems

View full text Add to dashboard Cite

Estimation by analogy (EBA) is one of the most attractive software effort development estimation techniques. However, one of the critical issues when using EBA is the occurrence of missing data (MD) in the historical data sets. The absence of values of several relevant software attributes is a frequent phenomenon that may cause inaccurate EBA estimations. The MD can be numerical and/or categorical. This paper evaluates four MD techniques (toleration, deletion, k-nearest neighbors (KNN) imputation and support vector regression (SVR) imputation) over four mixed data sets. A total of 432 experiments were conducted involving four MD techniques, nine MD percentages (from 10% to 90%), three missingness mechanisms (MCAR: Missing Completely at Random, MAR: Missing at Random and NIM: Non-Ignorable Missing) and four data sets. The evaluation process consists of four steps and uses several accuracy measures such as standardized accuracy (SA) and prediction level (Pred).The results suggest that EBA with imputation techniques achieved significantly better SA values over EBA with toleration or deletion regardless of the mechanism of missingness. Moreover, no particular MD imputation technique outperformed the other techniques overall. However, according to Pred and other accuracy criteria, EBA with SVR was the best, followed by KNN imputation; we also found that toleration instead of deletion improves the accuracy of EBA.

show abstract

Section: Introductionmentioning

confidence: 99%

“…The intensive and increasing use of EBA is due to its several advantages including simplicity, mimicking human reasoning, ease to understand and no assumption is made about the form of the relationship [1], [4]- [10]. Moreover, EBA can handle both quantitative and qualitative data [5]- [7], [11], [12].…”

Section: Introductionmentioning

confidence: 99%

Improved Analogy-based Effort Estimation with Incomplete Mixed Data

Abnane¹,

Idri²

2018

Annals of Computer Science and Information Systems

View full text Add to dashboard Cite

show abstract

“…Reliable software effort, cost, and time estimates are essential inputs for making decisions about investment, staffing, scheduling, and other planning and controlling of software projects . Thus, the software engineering research has introduced and investigated many approaches to software estimation, which is evidenced via systematic literature reviews, surveys, and model assessment studies (eg,). Model‐based estimation approaches like COCOMO have been widely investigated in the software engineering literature and continue to be used in practice with commercial models such as SEER‐SIM, Price‐S, and SLIM despite the increasing application of agile methods in software development.…”

Section: Discussionmentioning

confidence: 99%

“…Thus, the software engineering research community has introduced and evaluated many approaches to making reliable cost, effort, and other related predictions . The most popular approach is model‐based estimation which uses some algorithms and historical data to compute estimates …”

Section: Introductionmentioning

confidence: 99%

Investigating the use of duration‐based windows and estimation by analogy for COCOMO

Nguyen

Huynh

Boehm

et al. 2019

J Software Evolu Process

View full text Add to dashboard Cite

In model-based software estimation, using the right training data is a key contributor for making accurate predictions, which is crucial for the success of software projects.This study investigates the use of duration-based windows and estimation by analogy to calibrate COCOMO and assess their estimation performance. We compare these approaches as well as the use of all available historical data using the COCOMO data set of 341 projects and NASA data set of 93 projects. The results show that timing information exists in the data sets affecting estimation accuracy. Given sufficient data for calibration, using recently completed projects within short durations generates more accurate estimates than retaining all historical data or using k-nearest neighbors based on estimation by analogy. More training data spanning a long period of time may not lead to improved estimation accuracy. This study offers evidence to support the use of projects completed within recent years for training estimation models. KEYWORDSCOCOMO, duration-based windows, estimation by analogy, k-nearest neighbors, moving windows, software estimation | INTRODUCTIONCost and effort estimation is a key activity in software project management that can affect the outcome of software projects. Inaccurate cost estimates can lead to proposal rejection, financial losses, project management problems, and overall project failure. 1-3 Thus, the software engineering research community has introduced and evaluated many approaches to making reliable cost, effort, and other related predictions. 4 The most popular approach is model-based estimation which uses some algorithms and historical data to compute estimates. 5 Effort estimation models are often built or calibrated to past projects in organizations to compute the effort estimates of new projects. Thus, the performance of such models depends much on the relevance of training data. One important question is whether legacy data of past projects is useful for training estimation models.Existing studies proposed chronology-based approaches to splitting training data and investigating this challenging question. [6][7][8] These studies assume the basis that given a project p to be estimated, the model to estimate p is built using data selected from projects completed prior to the start of p. Kitchenham et al,7 which is one of the first studies that split training data chronologically, suggested the use of 30 most recent projects instead of all historical data in an organization to train and build regression models for estimation. Song et al 9 and Minku and Yao 10 investigated estimation methods that fetch one project at a time chronologically to build models, showing that parameters of the best resulting models change over time.Lokan and Mendes 6 investigated a chronological splitting method called moving windows in which estimation models are built using windows of n most recent projects (fixed-size windows). Lokan and Mendes 11 studied the effects of using windows of all projects completed within periods immediat...

show abstract

“…Similar work on various research areas has been performed using LSA, as in [16] LSA applied to understand the trend analysis of behavioral operation in supply chain management. In [17], LDA employed to understand the research trends and topics in software effort estimation. In [18], proposed a method for topic identification in web documents using web design features.…”

Section: Related Workmentioning

confidence: 99%

A Trend Analysis of Machine Learning Research with Topic Models and Mann-Kendall Test

Sharma¹,

Kumar²,

Chand³

2019

IJISA

View full text Add to dashboard Cite

This paper aims to systematically examine the literature of machine learning for the period of 1968~2017 to identify and analyze the research trends. A list of journals from well-established publishers ScienceDirect, Springer, JMLR, IEEE (approximately 23,365 journal articles) related to machine learning is used to prepare a content collection. To the best of our information, it is the first effort to comprehend the trend analysis in machine learning research with topic models: Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), and LDA with Coherent Model (LDA_CM). The LDA_CM topic model gives the highest topic coherence amongst all topic models under consideration. This study provides a scientific ground that helps to overcome the subjectivity of collective opinion. The Mann-Kendall test is used to understand the trend of the topics. Our findings provide indicative of paradigmatic shifts in research methodology of significant patterns of topical prominence and the evolving research areas. It is used to highlight the evolution regarding the previous and recent trends in research topics in the area of machine learning. Understanding such an intellectual structure and future trends will assist the researchers to adopt the divergent developments of this research in one place. This paper analyzes the overall trends of the machine learning research since 1968, based on the latent topics identified in the period of 2007~2017 that may be helpful to the researchers exploring the recommended areas and publish their research articles.

show abstract

Research patterns and trends in software effort estimation

Cited by 80 publications

References 181 publications

Improved Analogy-based Effort Estimation with Incomplete Mixed Data

Improved Analogy-based Effort Estimation with Incomplete Mixed Data

Investigating the use of duration‐based windows and estimation by analogy for COCOMO

A Trend Analysis of Machine Learning Research with Topic Models and Mann-Kendall Test

Contact Info

Product

Resources

About