A Hybrid Instance Selection Using Nearest-Neighbor for Cross-Project Defect Prediction

Ryu, Duksan; Jang, Jong‐In; Baik, Jongmoon

doi:10.1007/s11390-015-1575-5

Cited by 96 publications

(84 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Instance and dataset selection methods have been explored in CPDP. These include relevancy filtering by Turhan et al [1] based on the euclidean distance measure, data distributional characteristics and meta-learners by He et al [8], clustering by Herbold [9] and selective learning by Ryu et al [10]. These studies however, do not consider a search based approach.…”

Section: Related Workmentioning

confidence: 99%

An Exploratory Study of Search Based Training Data Selection for Cross Project Defect Prediction

Hosseini

Turhan

2018

2018 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)

View full text Add to dashboard Cite

Context: Search based approaches are gaining attention in cross project defect prediction (CPDP). The complexity of such approaches and existence of various design decisions are important issues to consider. Objective: We aim at investigating factors that can affect the performance of search based selection (SBS) approaches. We study a genetic instance selection approach (GIS) and present an evaluation of design options for search based CPDP. Method: Using an exploratory approach, data from different options of models are gathered and analyzed through ANOVA tests and effect sizes. Results: Both feature sets and validation dataset selection options show small or insignificant impacts on F-measure and precision, unlike the more affected false positive and true negative rates. Size of training data does not seem to be related to significant changes in Fmeasure and precision and high variability in performance are discouraging evidence for using larger datasets. Fitness function is one of the major factors that impact performance with much larger effect than the choice of validation dataset. Finally, while showing slight impacts, data label changes do not seem to be the top contributor to performance. Conclusions: We conclude that exploratory approaches can be effective for making design decisions in constructing search based CPDP models. Effect of individual tuned learners and their interaction with other affecting parameters and more in depth study of quality affecting factors guided by label changes are directions to investigate.

show abstract

Section: Related Workmentioning

confidence: 99%

An Exploratory Study of Search Based Training Data Selection for Cross Project Defect Prediction

Hosseini

Turhan

2018

2018 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)

View full text Add to dashboard Cite

show abstract

“…Tomasev et al [15] argued about hubness effect related to nearest neighbor that minority class instances are responsible for misclassification in high dimensional data unlike the fact that majority classes are mostly the reason of misclassification in low and medium dimensional data. Ryu et al [16] proposed HISNN, an instance based hybrid selection using nearest neighbor for cross project defect prediction. In this class imbalance is existed in source and target projects.…”

Section: Related Workmentioning

confidence: 99%

Improved Fuzzy-Optimally Weighted Nearest Neighbor Strategy to Classify Imbalanced Data

Patel¹,

Thakur²

2017

IJIES

View full text Add to dashboard Cite

Abstract:Learning from imbalanced data is one of the burning issues of the era. Traditional classification methods exhibit degradation in their performances while dealing with imbalanced data sets due to skewed distribution of data into classes. Among various suggested solutions, instance based weighted approaches secured the space in such cases. In this paper, we are proposing a new fuzzy weighted nearest neighbor method that optimally handle the imbalance issue of data. Use of optimal weights improve the performance of fuzzy nearest neighbor algorithm for default balanced distribution of data, for the classification of imbalanced data concept of adaptive K is merged with it that apply large K, number of nearest neighbors for large class and small K for small class. We deploy this combination to classify imbalanced data with better accuracy for different evaluation measures. Experimental results affirm that our proposed method perform well than the traditional fuzzy nearest neighbor classification for these type of data sets.

show abstract

“…al. [15] proposed an instance hybrid selection using nearest neighbor (HISNN). In such cases class imbalance exists in source and target project distribution.…”

Section: Related Workmentioning

confidence: 99%

Classification of Imbalanced Data Using a Modified Fuzzy-Neighbor Weighted Approach

Patel¹,

Thakur²

2017

IJIES

View full text Add to dashboard Cite

Classification of imbalanced datasets is one of the widely explored challenges of the decade. The imbalance occurs in many real world datasets due to uneven distribution of data into classes, i.e. one class has more instances while others have a few that results in the biased performances of traditional classifiers towards the majority class with large number of instances and ignorance of other classes with less data. Many solutions have been proposed to deal with this issue in various crisp and fuzzy methods. This paper proposes a new hybrid fuzzy weighted nearest neighbor approach to find better overall classification performance for both minority and majority classes of imbalanced data. Benefits of neighbor weighted K nearest neighbor approach i.e. assignment of large weights to small classes and small weights to large classes are merged with fuzzy logic. Fuzzy classification helps in classifying objects more adequately as it determines that how much an object belongs to a class. Experimental results exhibit the improvements in classification of imbalanced data of different imbalance ratios in comparison with other methods.

show abstract

A Hybrid Instance Selection Using Nearest-Neighbor for Cross-Project Defect Prediction

Cited by 96 publications

References 21 publications

An Exploratory Study of Search Based Training Data Selection for Cross Project Defect Prediction

An Exploratory Study of Search Based Training Data Selection for Cross Project Defect Prediction

Improved Fuzzy-Optimally Weighted Nearest Neighbor Strategy to Classify Imbalanced Data

Classification of Imbalanced Data Using a Modified Fuzzy-Neighbor Weighted Approach

Contact Info

Product

Resources

About