GCN2defect : Graph Convolutional Networks for SMOTETomek-based Software Defect Prediction

Zeng, Cheng; Zhou, Chun; Lv, Sheng; He, Peng; Huang, Jie

doi:10.1109/issre52982.2021.00020

Cited by 14 publications

(14 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Secondly, for FSet-3, building a software network [5] based on the actual dependencies between code files may better represent their relationships. Since the projects in the datasets may across multiple versions, it is not possible to construct the corresponding software network accurately, so we use a cooccurrence network (COON) instead.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

An Exploratory Study of Bug Prioritization and Severity Prediction based on Source Code Features

Zhou¹,

Cheng²,

He³

2022

Proceedings of the 34th International Conference on Software Engineering and Knowledge Engineering

View full text Add to dashboard Cite

Software systems generate a large number of bugs during their lifecycles. Managing and assigning these bug reports is a challenging task. Building prediction models for the priority or severity levels of bugs through bug reports can help developers prioritize highly urgent bugs. Traditional prediction models are based on the textual description information in bug reports. However, most of the description is little or no. According to the bug report, developers need to fix the corresponding source code files. If the corresponding source code file is a core module in a software system, the report is likely to have high-level assignment rights. Therefore, in this paper, we investigate the effect of using the source code file feature sets on classification performance. In addition, we evaluate the effect of different sampling methods on the data, namely SMOTE, RUS, SMOTEEN, Adaboost, and GAN. Extensive experiments were conducted on five open-source projects. The experimental results show that the source code file feature sets do not perform as well as the textual description features in bug reports. Besides, over-sampling methods do not alleviate the data imbalance problem in the case of insufficient data, while GAN performs best in the case of sufficient data.

show abstract

Section: Discussionmentioning

confidence: 99%

“…Once the file is defective, it is more likely to have a higher priority or severity. Leveraging neural networks to capture semantic and syntactic features from source files is widely used for bug localization [4] and defect prediction [5].…”

Section: Introductionmentioning

confidence: 99%

An Exploratory Study of Bug Prioritization and Severity Prediction based on Source Code Features

Zhou¹,

Cheng²,

He³

2022

Proceedings of the 34th International Conference on Software Engineering and Knowledge Engineering

View full text Add to dashboard Cite

show abstract

“…For example, Qu et al [10] used network embedding technique, node2vec, to automatically learn to encode dependency network structure into lowdimensional vector spaces to improve software defect prediction. Zeng et al [11] also recently analyzed the influence of network structure features of code on defect prediction.…”

Section: B Representation Learning In Software Engineeringmentioning

confidence: 99%

“…Given a path of the source code, the token sequences of all files will be output.As treated in [7], we only select three types of nodes on ASTs as tokens: (1) nodes of method invocations and class instance creations; (2) declaration nodes, i.e., method/type/enum declarations; (3) control flow nodes, such as while, if, and throw. For more details, please refer to our previous work [11].…”

Section: A Generation Of Local Semantic Features 1) Parsing Astmentioning

confidence: 99%

“…As we known, programs have well-defined syntax and rich 1 DOI reference number: 10.18293/SEKE2022-086 semantics hidden in the Abstract Syntax Trees (ASTs), which have been successfully extracted and used for defect prediction [7]. In addition, researchers also validated that the globally structural information extracted by network representation learning can lead to more accurate defect prediction [9][10][11].In other word, both the local semantic and global structural information of source code files may affect the selection of TDS in CPDP.…”

Section: Introductionmentioning

confidence: 98%

See 1 more Smart Citation

Data Selection for Cross-Project Defect Prediction with Local and Global Features of Source Code

He¹,

Zhou²

2022

Proceedings of the 34th International Conference on Software Engineering and Knowledge Engineering

View full text Add to dashboard Cite

： An open challenge for cross-project defect prediction (CPDP) is how to select the most appropriate training data for target project to build quality predictor. To our knowledge, existing methods are mostly dominated by traditional hand-crafted features, which do not fully encode the global structure between codes nor the semantics of code tokens. This work is to propose an improved method which is capable of automatically learning features for representing source code, and uses these feataures for training data selection. First, we propose a framework ALGoF to automatically learn the local semantic and global structural features of code files. Then, we analyze the feasibility of the learned features for data selection. Besides, we also validate the effectiveness of ALGoF by comparing with the traditional method. The experiments have been conducted on six defect datasets available at the PROMISE repository. The results show that ALGoF method helps to guide the training data selection for CPDP, and achieves a 48.31% improvement rate of F-measure. Meanwhile, our method has statistically significant advantages over the traditional method, especially when using both the local semantic and global structural features as the representation of code files. The maximum improvement of F-measure can reach 42.6%.

show abstract

LineFlowDP: A Deep Learning-Based Two-Phase Approach for Line-Level Defect Prediction

Yang,

Zhong,

Zeng

et al. 2024

Empir Software Eng

View full text Add to dashboard Cite

GCN2defect : Graph Convolutional Networks for SMOTETomek-based Software Defect Prediction

Cited by 14 publications

References 29 publications

An Exploratory Study of Bug Prioritization and Severity Prediction based on Source Code Features

An Exploratory Study of Bug Prioritization and Severity Prediction based on Source Code Features

Data Selection for Cross-Project Defect Prediction with Local and Global Features of Source Code

LineFlowDP: A Deep Learning-Based Two-Phase Approach for Line-Level Defect Prediction

Contact Info

Product

Resources

About