2021
DOI: 10.1109/tse.2019.2945020
|View full text |Cite
|
Sign up to set email alerts
|

How to “DODGE” Complex Software Analytics

Abstract: Machine learning techniques applied to software engineering tasks can be improved by hyperparameter optimization, i.e., automatic tools that find good settings for a learner's control parameters. We show that such hyperparameter optimization can be unnecessarily slow, particularly when the optimizers waste time exploring "redundant tunings", i.e., pairs of tunings which lead to indistinguishable results. By ignoring redundant tunings, DODGE(E), a tuning tool, runs orders of magnitude faster, while also generat… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
52
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 42 publications
(56 citation statements)
references
References 72 publications
(146 reference statements)
4
52
0
Order By: Relevance
“…learning on static code attributes such as C.K. and McGabe metrics) [1], [3], [18], [20], [33], [36], [44], [57], [64], [65], [72], [74], [80], [96] that are more granulated and high-dimensional.…”
Section: Future Workmentioning
confidence: 99%
See 1 more Smart Citation
“…learning on static code attributes such as C.K. and McGabe metrics) [1], [3], [18], [20], [33], [36], [44], [57], [64], [65], [72], [74], [80], [96] that are more granulated and high-dimensional.…”
Section: Future Workmentioning
confidence: 99%
“…Finally, data mining technology keeps evolving. Agrawal et al [1] recently argued that for any dataset where FFTs are effective, that there is a better algorithm (that they call DODGE( )). Moreover, Yang et al [109] designed the deep belief network to generate more quality metrics from the given metrics by Kamei et al [48].…”
Section: Future Workmentioning
confidence: 99%
“…Our preliminary analysis shows that the model building with the smallest defect dataset still takes longer than two days. 3 Hence, using semantic features for line-level defect prediction remains challenging.…”
Section: Challenges In Machine Learning-based Approachesmentioning
confidence: 99%
“…) [3,4]. A d2h value of 0 indicates that an approach achieves a perfect identification, i.e., an approach can identify all defective lines (Recall = 1) without any false positives (FAR = 0).…”
Section: Evaluation Measuresmentioning
confidence: 99%
“…Perhaps if we augmented early life cycle defect predictors with a little transfer learning (from other projects [43]), then we could generate better performing predictors. « Further to the last point, another interesting avenue of future research might be hyper-parameter optimization (HPO) [20], [80], [81]. HPO is often not applied in software analytics due to its computational complexity.…”
Section: III C O N C L U S I O Nmentioning
confidence: 99%