2012
DOI: 10.1016/j.ins.2011.01.039
|View full text |Cite
|
Sign up to set email alerts
|

Searching for rules to detect defective modules: A subgroup discovery approach

Abstract: Data mining methods in software engineering are becoming increasingly important as they can support several aspects of the software development life-cycle such as quality. In this work, we present a data mining approach to induce rules extracted from static software metrics characterising fault-prone modules. Due to the special characteristics of the defect prediction data (imbalanced, inconsistency, redundancy) not all classification algorithms are capable of dealing with this task conveniently. To deal with … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0

Year Published

2012
2012
2018
2018

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 55 publications
(18 citation statements)
references
References 50 publications
0
18
0
Order By: Relevance
“…However, these methods focus only on maximizing classification accuracy and aim to achieve interpretability just by building the model from rules. Similarly, algorithms for problems such as subgroup discovery [24, 33, 40, 46], contrast set learning [3, 4, 30], and emerging pattern mining [17, 19] identify sets of rules to describe the relationships among variables and discover interesting patterns in the data. In contrast, our work explicitly defines an objective function that scores interpretability and accuracy, and by optimizing it, we find a globally near-optimal model.…”
Section: Related Workmentioning
confidence: 99%
“…However, these methods focus only on maximizing classification accuracy and aim to achieve interpretability just by building the model from rules. Similarly, algorithms for problems such as subgroup discovery [24, 33, 40, 46], contrast set learning [3, 4, 30], and emerging pattern mining [17, 19] identify sets of rules to describe the relationships among variables and discover interesting patterns in the data. In contrast, our work explicitly defines an objective function that scores interpretability and accuracy, and by optimizing it, we find a globally near-optimal model.…”
Section: Related Workmentioning
confidence: 99%
“…The KC2 dataset was a data processing project developed with C ++ language and included 523 modules, of which 15% were defective. The KC3 dataset was a storage management project developed with Java language and included 458 modules, of which 9% were defective [20]. The CM1 dataset was a NASA spacecraft instrument project developed with C language and included 498 modules, of which 10% were defective.…”
Section: System Descriptionmentioning
confidence: 99%
“…Below, a rule of this type can be observed:R:IFX1=3ANDX2=SpainTHENTargetitalicvaluePairs of variable/interval. These rules are employed by EDER‐SD and GP3‐SD, for example. Next, an instance of this kind of rules is presented: R:italicIF0.25emX1=[]1,30.5emitalicAND0.5emX2=italicSpain0.5emitalicTHEN0.5emitalicTargetvaluePairs of variable/value with order relations.…”
Section: Subgroup Discoverymentioning
confidence: 99%