Software defect prediction (SDP) can help developers reasonably allocate limited resources for locating bugs and prioritizing their testing efforts. Existing methods often serialize an Abstract Syntax Tree (AST) obtained from the program source code into a token sequence, which is then inputted into the deep learning model to learn the semantic features. However, there are different ASTs with the same token sequence, and it is impossible to distinguish the tree structure of the ASTs only by a token sequence. To solve this problem, this paper proposes a framework called Semantic Feature Learning via Dual Sequences (SFLDS), which can capture the semantic and structural information in the AST for feature generation. Specifically, based on the AST, we select the representative nodes in the AST and convert the program source code into a simplified AST (S-AST). Our method introduces two sequences to represent the semantic and structural information of the S-AST, one is the result of traversing the S-AST node in pre-order, and another is composed of parent nodes. Then each token in the dual sequences is encoded as a numerical vector via mapping and word embedding. Finally, we use a bi-directional long short-term memory (BiLSTM) based neural network to automatically generate semantic features from the dual sequences for SDP. In addition, to leverage the statistical characteristics contained in the handcrafted metrics, we also propose a framework called Defect Prediction via SFLDS (DP-SFLDS) which combines the semantic features generated from SFLDS with handcrafted metrics to perform SDP. In our empirical studies, eight open-source Java projects from the PROMISE repository are chosen as our empirical subjects. Experimental results show that our proposed approach can perform better than several state-of-the-art baseline SDP methods.INDEX TERMS Software defect prediction, abstract syntax tree, deep learning, bi-directional long short-term memory network.LU LU received the Ph.D. degree from Xi'an Jiaotong University, in 1999. He is currently a Professor with the School of Computer Science and Engineering, South China University of Technology, China. His main research interests include software engineering, software testing, and software architecture design.
Cross-project defect prediction (CPDP) is a promising approach to help to allocate testing efforts efficiently and guarantee software reliability in the early software lifecycle. A CPDP method usually trains a software defect classifier based on labeled data sets. Then the trained classifier can predict new projects without labeled data. Most previous CPDP techniques focused on manually designing handcrafted features. However, these handcrafted features ignore the programs' semantic information. Moreover, some other existing defect prediction approaches learned semantic features from source code to build classifiers directly. However, they did not consider the distribution divergence between source and target projects. To address these limitations, we put forward a new method called Adversarial Discriminative Convolutional Neural Network (ADCNN). It can extract the transferable semantic features from source code for CPDP tasks. Specifically, we first parse source files into token vectors and then map them to integer vectors via word embedding. Second, we combine adversarial learning with discriminative feature learning to train the ADCNN model. The key of the ADCNN model is to learn the discriminative mapping of the target project to the source feature space by deceiving a domain discriminator. A domain discriminator tries to distinguish the target project files from the source project files. Finally, we use the extracted transferable semantic features to build a classifier for CPDP tasks. We evaluate our method on ten benchmark projects in terms of Fmeasure, AUC, and PofB20 (an effort-aware evaluation metric). The experimental results demonstrate that our ADCNN method performs better compared with other related CPDP methods. INDEX TERMS Cross-project defect prediction, transfer learning, adversarial learning, deep learning, convolutional neural network.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.