2021
DOI: 10.3390/app11114793
|View full text |Cite
|
Sign up to set email alerts
|

An Empirical Study on Software Defect Prediction Using CodeBERT Model

Abstract: Deep learning-based software defect prediction has been popular these days. Recently, the publishing of the CodeBERT model has made it possible to perform many software engineering tasks. We propose various CodeBERT models targeting software defect prediction, including CodeBERT-NT, CodeBERT-PS, CodeBERT-PK, and CodeBERT-PT. We perform empirical studies using such models in cross-version and cross-project software defect prediction to investigate if using a neural language model like CodeBERT could improve pre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 50 publications
(24 citation statements)
references
References 47 publications
0
17
0
Order By: Relevance
“…To encode file-level features, we tokenized and embedded the source code of each file using CodeBERT [16]; a state of the art code embedding model based on the RoBERTa architecture [40], which has been trained on millions of programming language examples. We selected CodeBERT embeddings due to their prominence in recent literature, promising performance in this domain [47], and capabilities for enhancing the use of small sized datasets [51]. We used a random forest classifier to perform classification, based on its proven success for file-level prediction in prior works [30].…”
Section: Software Vulnerability Predictionmentioning
confidence: 99%
“…To encode file-level features, we tokenized and embedded the source code of each file using CodeBERT [16]; a state of the art code embedding model based on the RoBERTa architecture [40], which has been trained on millions of programming language examples. We selected CodeBERT embeddings due to their prominence in recent literature, promising performance in this domain [47], and capabilities for enhancing the use of small sized datasets [51]. We used a random forest classifier to perform classification, based on its proven success for file-level prediction in prior works [30].…”
Section: Software Vulnerability Predictionmentioning
confidence: 99%
“…CodeBERT has pushed the boundaries in natural language processing and represents the state-of-the-art for generating code documentation given snippets, as well as retrieving code snippets given a natural language search query across six different programming languages [41]. Moreover, it has also been applied in software engineering to perform different tasks [68].…”
Section: Threats To Validitymentioning
confidence: 99%
“…CodeBERT has pushed the boundaries in natural language processing and represents the state-of-the-art for generating code documentation given snippets, as well as retrieving code snippets given a natural language search query across six different programming languages (Husain et al 2019). Moreover, it has also been applied in software engineering to perform different tasks (Pan et al 2021).…”
Section: Threats To Validitymentioning
confidence: 99%