Due to the differentiation between training and testing data in the feature space, crossproject defect prediction (CPDP) remains unaddressed within the field of traditional machine learning. Recently, transfer learning has become a research hot-spot for building classifiers in the target domain using the data from the related source domains. To implement better CPDP models, recent studies focus on either feature transferring or instance transferring to weaken the impact of irrelevant cross-project data. Instead, this work proposes a dual weighting mechanism to aid the learning process, considering both feature transferring and instance transferring. In our method, a local data gravitation between source and target domains determines instance weight, while features that are highly correlated with the learning task, uncorrelated with other features and minimizing the difference between the domains are rewarded with a higher feature weight. Experiments on 25 real-world datasets indicate that the proposed approach outperforms the existing CPDP methods in most cases. By assigning weights based on the different contribution of features and instances to the predictor, the proposed approach is able to build a better CPDP model and demonstrates substantial improvements over the stateof-the-art CPDP models.This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
Software defect prediction (SDP) can find potential containing defect modules, which assists software developers in allocating limited test resources more efficiently. Because traditional software features fail to capture the semantics of source code, various studies have turned to extracting deep learning features. Existing related approaches often parse the program source code into Abstract Syntax Trees (ASTs) for further processing. However, most of these approaches ignore AST nodes' hierarchical and position-sensitive structure. To overcome the aforementioned issues, a two-stage AST encoding (TSE) method is proposed in this paper for software defect prediction. Experiments on eight Java open-source projects showed that our proposed SDP method outperforms several traditional methods and state-of-the-art deep learning methods in terms of F-measure and MCC.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.