µBert: Mutation Testing using Pre-Trained Language Models

Degiovanni, Renzo; Papadakis, Mike

doi:10.1109/icstw55395.2022.00039

Cited by 16 publications

(16 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Table 1 summarises the type of targeted AST nodes by µBERT, with corresponding example expressions and induced mutants. We refer to these as the conventional mutations provided by µBERT, denoted by µBERT conv in our evaluation, previously introduced in the preliminary version of the approach [18].…”

Section: Ast Nodes Selectionmentioning

confidence: 99%

“…To answer this question, we generate two sets of mutants using µBERT: 1) the first set using all possible mutations that we denote as µBERT and 2) a second one using only the conventional µBERT' mutations -part of our preliminary implementation [18], excluding the additive ones -that we denote as µBERT conv . Then we evaluate the fault detection ability of test suites selected to kill the mutants from each set.…”

Section: Research Questionsmentioning

confidence: 99%

“…The closest related work is a preliminary implementation of µBERT that was recently presented in the 2022 mutation workshop [18]. This implementation, denoted as µBERT conv in our evaluation, includes the conventional mutations (to mask and replace tokens by the model predictiosn), but it does not include the condition-seeding additive mutations that provide major benefits for fault detection.…”

Section: Related Workmentioning

confidence: 99%

“…Example of µBERT conventional mutations, available in the preliminary version of the approach[18], denoted by µBERT conv .…”

mentioning

confidence: 99%

See 3 more Smart Citations

Efficient Mutation Testing via Pre-Trained Language Models

Khanfir¹,

Degiovanni²,

Papadakis³

et al. 2023

Preprint

View full text Add to dashboard Cite

Mutation testing is an established fault-based testing technique. It operates by seeding faults into the programs under test and asking developers to write tests that reveal these faults. These tests have the potential to reveal a large number of faults -those that couple with the seeded ones -and thus are deemed important. To this end, mutation testing should seed faults that are both "natural" in a sense easily understood by developers and strong (have high chances to reveal faults). To achieve this we propose using pre-trained generative language models (i.e. CodeBERT) that have the ability to produce developer-like code that operates similarly, but not exactly, as the target code. This means that the models have the ability to seed natural faults, thereby offering opportunities to perform mutation testing. We realise this idea by implementing µBERT, a mutation testing technique that performs mutation testing using CodeBert and empirically evaluated it using 689 faulty program versions. Our results show that the fault revelation ability of µBERT is higher than that of a state-of-the-art mutation testing (PiTest), yielding tests that have up to 17% higher fault detection potential than that of PiTest. Moreover, we observe that µBERT can complement PiTest, being able to detect 47 bugs missed by PiTest, while at the same time, PiTest can find 13 bugs missed by µBERT.

show abstract

Section: Ast Nodes Selectionmentioning

confidence: 99%

Section: Research Questionsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

“…Example of µBERT conventional mutations, available in the preliminary version of the approach[18], denoted by µBERT conv .…”

mentioning

confidence: 99%

See 2 more Smart Citations

Efficient Mutation Testing via Pre-Trained Language Models

Khanfir¹,

Degiovanni²,

Papadakis³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…The usage of this model in the fault injection domain shows its ability to seed "natural" faults that, we can say, semantically resemble real faults [34], [35]. Effectively, the faults injected resemble what a real programmer could write (regarding the programmatic rules, convention, etc) [36].…”

Section: Nlp For Fault Injection (Not Vulnerability Injection)mentioning

confidence: 99%

IntJect: Vulnerability Intent Bug Seeding

Petit

Khanfir

Soremekun

et al. 2022

2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)

View full text Add to dashboard Cite

Studying and exposing software vulnerabilities is important to ensure software security, safety, and reliability. Software engineers often inject vulnerabilities into their programs to test the reliability of their test suites, vulnerability detectors, and security measures. However, state-of-the-art vulnerability injection methods only capture code syntax/patterns, they do not learn the intent of the vulnerability and are limited to the syntax of the original dataset. To address this challenge, we propose the first intent-based vulnerability injection method that learns both the program syntax and vulnerability intent. Our approach applies a combination of NLP methods and semantic-preserving program mutations (at the bytecode level) to inject code vulnerabilities. Given a dataset of known vulnerabilities (containing benign and vulnerable code pairs), our approach proceeds by employing semantic-preserving program mutations to transform the existing dataset to semantically similar code. Then, it learns the intent of the vulnerability via neural machine translation (Seq2Seq) models. The key insight is to employ Seq2Seq to learn the intent (context) of the vulnerable code in a manner that is agnostic of the specific program instance. We evaluate the performance of our approach using 1275 vulnerabilities belonging to five (5) CWEs from the Juliet test suite. We examine the effectiveness of our approach in producing compilable and vulnerable code. Our results show that INTJECT is effective, almost all (99%) of the code produced by our approach is vulnerable and compilable. We also demonstrate that the vulnerable programs generated by INTJECT are semantically similar to the withheld original vulnerable code. Finally, we show that our mutation-based data transformation approach outperforms its alternatives, namely data obfuscation and using the original data.

show abstract

Empirical Study of Meta-learning-Based Approach for Predictive Mutation Testing

Chetna,

Kaur

2023

Lecture Notes in Electrical Engineering

View full text Add to dashboard Cite

µBert: Mutation Testing using Pre-Trained Language Models

Cited by 16 publications

References 29 publications

Efficient Mutation Testing via Pre-Trained Language Models

Efficient Mutation Testing via Pre-Trained Language Models

IntJect: Vulnerability Intent Bug Seeding

Empirical Study of Meta-learning-Based Approach for Predictive Mutation Testing

Contact Info

Product

Resources

About