Multi-objective software effort estimation

Sarro, Federica; Petrozziello, Alessio; Harman, Mark

doi:10.1145/2884781.2884830

Cited by 130 publications

(187 citation statements)

References 88 publications

(119 reference statements)

Supporting

Mentioning

180

Contrasting

Unclassified

Order By: Relevance

“…Recent approaches (e.g. [10,28]) have employed a evolutionary, search-based approach to this problem. Largely inspired by Sarro et.…”

Section: Introductionmentioning

confidence: 99%

“…al. 's work [28] done for e↵ort estimation for the whole project, we employ a multi-objective search-based evolutionary approach to estimate the resolution time of each single issue in the project. Specifically, we leverage a meta-heuristic technique, namely genetic programming (GP) [18], to generate a large number of candidate estimation models, and search for the ones that are optimal with respect to a number of objectives.…”

Section: Introductionmentioning

confidence: 99%

“…al. [28], the first objective is to minimize the Sum of Absolute Errors, which measures the accuracy of an estimation model in terms of the di↵erences between values (i.e. issue resolution time) estimated by the model and the values actually observed.…”

Section: Introductionmentioning

confidence: 99%

“…al. 's approach [28]. This second objective also leads to reduced computational costs since it encourages parsimonious (thus, computationally e cient) candidate solutions be generated.…”

Section: Introductionmentioning

confidence: 99%

“…al. 's work [28], we use two standardized measures, Mean Absolute Error and Standardized Accuracy, to evaluate the performance of estimation models, and also use a non-parametric Wilcoxon test [4] and Vargha and Delaney's statistic [31] to demonstrate both the statistical significance and the e↵ect size of the results.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Multi-objective search-based approach to estimate issue resolution time

Al-Zubaidi

Dam

Ghose

et al. 2017

Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering

View full text Add to dashboard Cite

Background: Resolving issues is central to modern agile software development where a software is developed and evolved incrementally through series of issue resolutions. An issue could represent a requirement for a new functionality, a report of a software bug or a description of a project task. Aims: Knowing how long an issue will be resolved is thus important to different stakeholders including end-users, bug reporters, bug triagers, developers and managers. This paper aims to propose a multi-objective search-based approach to estimate the time required for resolving an issue. Methods: Using genetic programming (a meta-heuristic optimization method), we iteratively generate candidate estimate models and search for the optimal model in estimating issue resolution time. The search is guided simultaneously by two objectives: maximizing the accuracy of the estimation model while minimizing its complexity. Results: Our evaluation on 8,260 issues from five large open source projects demonstrate that our approach significantly (p < 0.001) outperforms both the baselines and state-of-the-art techniques. Conclusions: Evolutionary search-based approaches offer an effective alternative to build estimation models for issue resolution time. Using multiple objectives, one for measuring the accuracy and the other for the complexity, helps produce accurate and simple estimation models. Resolving issues is central to modern agile software development where a software is developed and evolved incrementally through series of issue resolutions. An issue could represent a requirement for a new functionality, a report of a software bug or a description of a project task.Aims: Knowing how long an issue will be resolved is thus important to di↵erent stakeholders including end-users, bug reporters, bug triagers, developers and managers. This paper aims to propose a multi-objective search-based approach to estimate the time required for resolving an issue.Methods: Using genetic programming (a meta-heuristic optimization method), we iteratively generate candidate estimate models and search for the optimal model in estimating issue resolution time. The search is guided simultaneously by two objectives: maximizing the accuracy of the estimation model while minimizing its complexity.Results: Our evaluation on 8,260 issues from five large open source projects demonstrate that our approach significantly (p < 0.001) outperforms both the baselines and state-of-the-art techniques.Conclusions: Evolutionary search-based approaches o↵er an e↵ective alternative to build estimation models for issue resolution time. Using multiple objectives, one for measuring the accuracy and the other for the complexity, helps produce accurate and simple estimation models.

show abstract

“…Recent approaches (e.g. [10,28]) have employed a evolutionary, search-based approach to this problem. Largely inspired by Sarro et.…”

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

“…al. 's approach [28]. This second objective also leads to reduced computational costs since it encourages parsimonious (thus, computationally e cient) candidate solutions be generated.…”

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Multi-objective search-based approach to estimate issue resolution time

Al-Zubaidi

Dam

Ghose

et al. 2017

Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering

View full text Add to dashboard Cite

show abstract

Branch coverage prediction in automated testing

Grano

Titov

Panichella

et al. 2019

J Software Evolu Process

View full text Add to dashboard Cite

Software testing is crucial in continuous integration (CI). Ideally, at every commit, all the test cases should be executed, and moreover, new test cases should be generated for the new source code. This is especially true in a Continuous Test Generation (CTG) environment, where the automatic generation of test cases is integrated into the continuous integration pipeline. In this context, developers want to achieve a certain minimum level of coverage for every software build. However, executing all the test cases and, moreover, generating new ones for all the classes at every commit is not feasible. As a consequence, developers have to select which subset of classes has to be tested and/or targeted by test-case generation. We argue that knowing a priori the branch coverage that can be achieved with test-data generation tools can help developers into taking informed decision about those issues. In this paper, we investigate the possibility to use source-code metrics to predict the coverage achieved by test-data generation tools. We use four different categories of source-code features and assess the prediction on a large data set involving more than 3'000 Java classes. We compare different machine learning algorithms and conduct a fine-grained feature analysis aimed at investigating the factors that most impact the prediction accuracy. Moreover, we extend our investigation to four different search budgets. Our evaluation shows that the best model achieves an average 0.15 and 0.21 MAE on nested cross-validation over the different budgets, respectively, on EVOSUITE and RANDOOP. Finally, the discussion of the results demonstrate the relevance of coupling-related features for the prediction accuracy. KEYWORDS automated software testing, coverage prediction, machine learning, software testing INTRODUCTIONSoftware testing is widely recognized as a crucial task in any software development process, 1 estimated at being at least about half of the entire development cost. 2,3 In the last years, we witnessed a wide adoption of continuous integration (CI) practices, where new or changed code is integrated extremely frequently into the main codebase. Testing plays an important role in such a pipeline: In an ideal world, at every single commit, every system's test case should be executed (regression testing). Moreover, additional test cases might be automatically generated to test all the new -or modified-code introduced into the main codebase. 4 This is especially true in a Continuous Test Generation (CTG) environment, where the generation of test cases is directly integrated into the continuous integration cycle. 4 However, because of the time constraints between frequent commits, a complete regression testing is not feasible for large projects. 5 Furthermore, even test suite augmentation, 6 ie, the automatic generation considering code changes and their effect on the previous codebase, is hardly doable because of the extensive amount of time needed to generate tests for just a single class. As developers want to ensure a cer...

show abstract

Data‐driven benchmarking in software development effort estimation: The few define the bulk

Mittas

Angelis

2020

J Software Evolu Process

View full text Add to dashboard Cite

Context The rapid evolvement of software development effort estimation models created the need for empirical evaluation of their quality. The empirical evaluation is based either on hypothesis tests with respect to a single criterion or on aggregating methods for multiple criteria. However, a model can be considered as a multidimensional entity performing differently on alternative datasets and its performance can be divergent when expressed by alternative criteria. Objective In this study, we explore this multidimensional nature of models by considering them as points in two different spaces (domain and criteria spaces). Method Introducing an alternative approach for data‐driven benchmarking, a new framework based on archetypal analysis is proposed for evaluation purposes of multiple models. Results The benefits of the framework are illustrated through a large‐scale experimental setup on a set of 93 effort estimation models, trained and tested on 10 datasets under 8 criteria providing answers to critical research questions. Conclusion The results indicate that a small minority of reference models is enough to define the performance of the bulk of all models. The framework focuses on models that have behavior close to archetypes and especially those that are close to a “best” archetype.

show abstract

Multi-objective software effort estimation

Cited by 130 publications

References 88 publications

Multi-objective search-based approach to estimate issue resolution time

Multi-objective search-based approach to estimate issue resolution time

Branch coverage prediction in automated testing

Data‐driven benchmarking in software development effort estimation: The few define the bulk

Contact Info

Product

Resources

About