2018 IEEE International Conference on Software Maintenance and Evolution (ICSME) 2018
DOI: 10.1109/icsme.2018.00018
|View full text |Cite
|
Sign up to set email alerts
|

AutoSpearman: Automatically Mitigating Correlated Software Metrics for Interpreting Defect Models

Abstract: The interpretation of defect models heavily relies on software metrics that are used to construct them. However, such software metrics are often correlated in defect models. Prior work often uses feature selection techniques to remove correlated metrics in order to improve the performance of defect models. Yet, the interpretation of defect models may be misleading if feature selection techniques produce subsets of inconsistent and correlated metrics. In this paper, we investigate the consistency and correlatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
26
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 53 publications
(26 citation statements)
references
References 76 publications
0
26
0
Order By: Relevance
“…As Table 10 shows, some of our features are likely to be correlated, e.g., LineCountText and LengthText. To mitigate correlated metrics, we used AutoSpearman [35], an automated metric selection approach based on correlation analyses, with a threshold of 0.7.…”
Section: Non-documentation Linksmentioning
confidence: 99%
“…As Table 10 shows, some of our features are likely to be correlated, e.g., LineCountText and LengthText. To mitigate correlated metrics, we used AutoSpearman [35], an automated metric selection approach based on correlation analyses, with a threshold of 0.7.…”
Section: Non-documentation Linksmentioning
confidence: 99%
“…Prior work points out that software metrics are often correlated [22,32,33,35,36,74,77,85]. However, little is known the prevalence of correlated metrics in the publiclyavailable defect datasets.…”
Section: Correlated Metrics and Concerns In The Literaturementioning
confidence: 99%
“…Metrics of prior studies are often correlated [22,32,33,35,36,74,77,85]. For example, Herraiz et al [33], and Gil et al [22] point out that code complexity (CC) is often correlated with code size (size).…”
Section: Introductionmentioning
confidence: 99%
“…Among the three main feature selection methods, filter methods are preferred to wrapper and embedded methods in applications where the computational efficiency, classifier independence, simplicity, ease of use and the stability of the results are required. Therefore, filter feature selection remains an interesting topic in many recent research areas such as biomarker identification for cancer prediction and drugs discovery, text classification and predicting defective software [3][4][5]10,11,16,18] and has growing interest in big data applications [19]; according to the Google Scholar search results, the number of research papers published related to filter methods in year 2018 is ∼1,800 of which ∼170 are in gene selection area.…”
Section: Introductionmentioning
confidence: 99%
“…The nearby pixels in images can be grouped together based on their spatial locality to improve selection of pixels for image classification. In software data, software metrics can be grouped according to their granularity in the code to improve the prediction of defective software [11,18]. In Sect.…”
Section: Introductionmentioning
confidence: 99%