Input Prioritization for Testing Neural Networks

Byun, Taejoon; Sharma, Vaibhav; Vijayakumar, Abhishek; Rayadurgam, Sanjai; Cofer, Darren

doi:10.1109/aitest.2019.000-6

Cited by 65 publications

(43 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…By removing such corner cases we expect to get a less discriminative test set, which is less effective in assessing the quality of the model under test. This approach has previously been used for a similar task by Jahangirova & Tonella [22] and for test input prioritisation by Byun et al [13]. In our experiment, for classification systems we build a weak test set by keeping only the test inputs that are predicted with a confidence equal to 1, where confidence is measured as the highest softmax output value.…”

Section: Rq3 [Comparison Withmentioning

confidence: 99%

DeepCrime: mutation testing of deep learning systems based on real faults

Humbatova

Jahangirova

Tonella

2021

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

View full text Add to dashboard Cite

Deep Learning (DL) solutions are increasingly adopted, but how to test them remains a major open research problem. Existing and new testing techniques have been proposed for and adapted to DL systems, including mutation testing. However, no approach has investigated the possibility to simulate the effects of real DL faults by means of mutation operators. We have defined 35 DL mutation operators relying on 3 empirical studies about real faults in DL systems. We followed a systematic process to extract the mutation operators from the existing fault taxonomies, with a formal phase of conflict resolution in case of disagreement. We have implemented 24 of these DL mutation operators into DEEPCRIME, the first source-level pre-training mutation tool based on real DL faults. We have assessed our mutation operators to understand their characteristics: whether they produce interesting, i.e., killable but not trivial, mutations. Then, we have compared the sensitivity of our tool to the changes in the quality of test data with that of DeepMutation++, an existing post-training DL mutation tool.

show abstract

Section: Rq3 [Comparison Withmentioning

confidence: 99%

DeepCrime: mutation testing of deep learning systems based on real faults

Humbatova

Jahangirova

Tonella

2021

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

View full text Add to dashboard Cite

show abstract

“…DeepCT [50] proposes a combinatorial testing approach, while DeepCover [69] adapts MC/DC from traditional software testing and defines adequacy criteria that investigate the changes of successive pairs of layers. Recent research also proposes testing criteria and techniques driven by symbolic execution [31], coverage guided fuzzing [56,76] and metamorphic transformations [72], while other research explores test prioritization [16] and fault localisation [24].…”

Section: Related Workmentioning

confidence: 99%

Importance-driven deep learning system testing

Gerasimou

Enişer

Şen

et al. 2020

Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering

View full text Add to dashboard Cite

Deep Learning (DL) systems are key enablers for engineering intelligent applications due to their ability to solve complex tasks such as image recognition and machine translation. Nevertheless, using DL systems in safety-and security-critical applications requires to provide testing evidence for their dependable operation. Recent research in this direction focuses on adapting testing criteria from traditional software engineering as a means of increasing confidence for their correct behaviour. However, they are inadequate in capturing the intrinsic properties exhibited by these systems. We bridge this gap by introducing DeepImportance, a systematic testing methodology accompanied by an Importance-Driven (IDC) test adequacy criterion for DL systems. Applying IDC enables to establish a layer-wise functional understanding of the importance of DL system components and use this information to guide the generation of semantically-diverse test sets. Our empirical evaluation on several DL systems, across multiple DL datasets and with state-of-the-art adversarial generation techniques demonstrates the usefulness and effectiveness of DeepImportance and its ability to guide the engineering of more robust DL systems.

show abstract

“…These two aspects lead to high test generation costs. Byun et al [148] used DNN metrics like cross entropy, surprisal, and Bayesian uncertainty to prioritise test inputs and experimentally showed that these are good indicators of inputs that expose unacceptable behaviours, which are also useful for retraining.…”

Section: Test Prioritisation and Reductionmentioning

confidence: 99%

Machine Learning Testing: Survey, Landscapes and Horizons

Zhang

Harman

Ма

et al. 2022

IIEEE Trans. Software Eng.

534

318

View full text Add to dashboard Cite

This paper provides a comprehensive survey of Machine Learning Testing (ML testing) research. It covers 128 papers on testing properties (e.g., correctness, robustness, and fairness), testing components (e.g., the data, learning program, and framework), testing workflow (e.g., test generation and test evaluation), and application scenarios (e.g., autonomous driving, machine translation). The paper also analyses trends concerning datasets, research trends, and research focus, concluding with research challenges and promising research directions in ML testing. Index Terms-machine learning, software testing, deep neural network, ! • Jie M. Zhang and Mark Harman are with CREST, University College London, United Kingdom. Mark Harman is also with Facebook London.

show abstract

Input Prioritization for Testing Neural Networks

Cited by 65 publications

References 22 publications

DeepCrime: mutation testing of deep learning systems based on real faults

DeepCrime: mutation testing of deep learning systems based on real faults

Importance-driven deep learning system testing

Machine Learning Testing: Survey, Landscapes and Horizons

Contact Info

Product

Resources

About