Two-Stage AST Encoding for Software Defect Prediction

Zhou, Yanwu; Lu, Lu; Zou, Quanyi; Li, Cuixu

doi:10.18293/seke2022-039

Cited by 2 publications

(3 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similar to previous SDD research [10,12,34], we use four evaluation metrics to examine experimental results: F1-score, Recall, Precision, and MCC. Considering the context of the experimental dataset, we opt for the weighted versions of the F1-score, Recall, and Precision instead of the default binary [35].…”

Section: Evaluation Metricsmentioning

confidence: 99%

See 1 more Smart Citation

Parameter-Efficient Multi-classification Software Defect Detection Method Based on Pre-trained LLMs

Wang,

Lu,

Yang

et al. 2024

Int J Comput Intell Syst

Self Cite

View full text Add to dashboard Cite

Software Defect Detection (SDD) has always been critical to the development life cycle. A stable defect detection system can not only alleviate the workload of software testers but also enhance the overall efficiency of software development. Researchers have recently proposed various artificial intelligence-based SDD methods and achieved significant advancements. However, these methods still exhibit limitations in terms of reliability and usability. Therefore, we introduce MSDD-(IA)3, a novel framework leveraging the pre-trained CodeT5+ and (IA)3 for parameter-efficient multi-classification SDD. This framework constructs a detection model based on pre-trained CodeT5+ to generate code representations while capturing defect-prone features. Considering the high overhead of pre-trained LLMs, we injects (IA)3 vectors into specific layers, where only these injected parameters are updated to reduce the training cost. Furthermore, leveraging the properties of the pre-trained CodeT5+, we design a novel feature sequence that enriches the input data through the combination of source code with Natural Language (NL)-based expert metrics. Our experimental results on 64K real-world Python snippets show that MSDD-(IA)3 demonstrates superior performance compared to state-of-the-art SDD methods, including PM2-CNN, in terms of F1-weighted, Recall-weighted, Precision-weighted, and Matthews Correlation Coefficient. Notably, the training parameters of MSDD-(IA)3 are only 0.04% of those of the original CodeT5+. Our experimental data and code can be available at (https://gitee.com/wxyzjp123/msdd-ia3/).

show abstract

Section: Evaluation Metricsmentioning

confidence: 99%

“…(1) TSE: This method designs a two-stage SDD using selfattention mechanism and tree-based LSTMs [34]. (2) PHAN: This method employs a positional hierarchical attention network to extract semantic features from programs [36].…”

Section: Approachmentioning

confidence: 99%

Parameter-Efficient Multi-classification Software Defect Detection Method Based on Pre-trained LLMs

Wang,

Lu,

Yang

et al. 2024

Int J Comput Intell Syst

Self Cite

View full text Add to dashboard Cite

show abstract

“…Firstly, whether AST is flattened as a text sequence or maintains its original structure, they encode the AST as a whole, ignoring the information at a moderate granularity level. Secondly, although some studies in SDP have taken a hierarchical structure into account, such as decomposing code into tokens and lines levels [1], and splitting AST into nodes and subtrees levels [6], there is a DOI reference number: 10.18293/SEKE23-119 lack of encoding at the path granularity in SDP. Furthermore, existing AST presentation by mining paths [7] does not consider the positional information between paths, while the positional difference may indicate the existence of defects.…”

Section: Introductionmentioning

confidence: 99%

Software Defect Prediction via Positional Hierarchical Attention Network (S)

Yi,

Xu,

et al. 2023

International Conferences on Software Engineering and Knowledge Engineering

View full text Add to dashboard Cite

Software Defect Prediction (SDP) aims to identify defect-prone modules in advance to ensure software quality. In SDP research based on deep learning, the mainstream approach is to extract deep semantic features from an Abstract Syntax Tree (AST). Theoretically, the AST as a bi-dimensional structure encloses information at the node level, fragment level, and entire tree level. However, most existing research serializes the whole AST without considering the expression at different granularities.To address this limitation, we introduce a positional hierarchical attention network (PHAN) that acquires semantic features by simultaneously considering contexts between nodes and paths. Specifically, our model incorporates attention mechanisms to capture information of varying importance at separate hierarchies, and relative position representations to distinguish the contributions of different paths. Experimental results demonstrate that PHAN significantly outperforms existing baseline methods.

show abstract

Two-Stage AST Encoding for Software Defect Prediction

Cited by 2 publications

References 13 publications

Parameter-Efficient Multi-classification Software Defect Detection Method Based on Pre-trained LLMs

Parameter-Efficient Multi-classification Software Defect Detection Method Based on Pre-trained LLMs

Software Defect Prediction via Positional Hierarchical Attention Network (S)

Contact Info

Product

Resources

About