Cross-project defect prediction (CPDP) is used to build defect prediction models when data from the target project are not enough. There has been several approaches to improve the performance of CPDP, such as feature transformation and instance selection methods. However, existing techniques are strongly dependent on the target data to reduce the distribution discrepancy between source and target projects. That is, the performance of these methods is determined by the effectiveness of feature transformation or the similarity between two projects. Additionally, when there is a large amount of source data that needs to be matched with target data, it will take much time and reduce the efficiency of model construction.Therefore, it is vital to explore a target project-agnostic approach to build CPDP models. This paper presents a Weighted Isolation Forest with class Label information Filter (WIFLF) to relieve the issues above. Four groups of datasets from AEEEM, Relink and PROMISE Data Repository are used to conduct CPDP models. Besides, WIFLF is compared with 12 approaches. The experimental results indicate that WIFLF significantly outperforms all the baselines. Specifically, WIFLF with random forest significantly improves the performance over the baselines on average by at least 14.64% and 4.90% with respect to Skewed F-Measure and G-Measure, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.