2021

DOI: 10.48550/arxiv.2112.12591

|View full text |Cite

Preprint

|

Sign up to set email alerts

|

Black-Box Testing of Deep Neural Networks Through Test Case Diversity

Zohreh Aghababaeyan¹,

Manel Abdellatif²,

Lionel C. Briand³

et al.

Abstract: Deep Neural Networks (DNNs) have been extensively used in many areas including image processing, medical diagnostics, and autonomous driving. However, DNNs can exhibit erroneous behaviours that may lead to critical errors, especially when used in safety-critical systems. Inspired by testing techniques for traditional software systems, researchers have proposed neuron coverage criteria, as an analogy to source code coverage, to guide the testing of DNN models. Despite very active research on DNN coverage, sever… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Introduction5

Citation Types

Supporting

0

Mentioning

21

Contrasting

0

Year Published

2023

2023

2023

2023

Publication Types

Select...

Other1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

(21 citation statements)

References 61 publications

(141 reference statements)

Supporting

0

Mentioning

21

Contrasting

0

Order By: Relevance

“…Similar to code coverage in traditional software systems [14], researchers have proposed several neuron coverage metrics in the context of DNN test selection to prioritize the selection of test inputs with high coverage scores [6][7][8][9][10]. Studies have shown the effectiveness of these approaches for distinguishing adversarial inputs but failed to demonstrate a positive correlation between coverage and DNN mispredictions [15][16][17][18]. Furthermore, for some coverage metrics, reaching maximum coverage can be easily achieved by selecting a few test inputs [15,19], thus putting into question the usefulness of such white-box DNN test selection approaches.…”

Section: Introductionmentioning

confidence: 99%

“…Moreover, the use of white-box DNN test selection approaches is hindered by the requirement for access to the internal structure of the DNN model or the training dataset. This can be a significant limitation, particularly when the model is proprietary or provided by a third party [16].…”

Section: Introductionmentioning

confidence: 99%

“…Another category of black-box metrics that has been successfully applied in testing traditional software systems is diversity [21][22][23]. However, only a few studies have investigated its effectiveness in testing DNN models [11,16,24]. Zhao et al [24] studied stateof-the-art (SOTA) test selection approaches for DNNs and found that they do not guarantee the selection of diverse test input sets.…”

Section: Introductionmentioning

confidence: 99%

“…Zhao et al [24] studied stateof-the-art (SOTA) test selection approaches for DNNs and found that they do not guarantee the selection of diverse test input sets. Also, Aghababaeyan et al [16] compared the fault-revealing power of several diversity metrics with coverage criteria. They found that geometric diversity (GD) [25] outperforms coverage criteria in terms of fault-revealing power and computational time.…”

Section: Introductionmentioning

confidence: 99%

“…Similar to failures in traditional software systems, many mispredictions result from the same faults in the DNN model and are therefore redundant [16,27]. This is why testing studies typically focus on faults, not failures [28,29].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

DeepGD: A Multi-Objective Black-Box Test Selection Approach for Deep Neural Networks

Aghababaeyan¹,

et al. 2023

Preprint

View full text Add to dashboard Cite

Deep neural networks (DNNs) are widely used in various application domains such as image processing, speech recognition, and natural language processing. However, testing DNN models may be challenging due to the complexity and size of their input domain. Particularly, testing DNN models often requires generating or exploring large unlabeled datasets. In practice, DNN test oracles, which identify the correct outputs for inputs, often require expensive manual effort to label test data, possibly involving multiple experts to ensure labeling correctness.In this paper, we propose DeepGD, a black-box multi-objective test selection approach for DNN models. It reduces the cost of labeling by prioritizing the selection of test inputs with high faultrevealing power from large unlabeled datasets. DeepGD not only selects test inputs with high uncertainty scores to trigger as many mispredicted inputs as possible but also maximizes the probability of revealing distinct faults in the DNN model by selecting diverse mispredicted inputs.The experimental results conducted on four widely used datasets and five DNN models show that in terms of fault-revealing ability:(1) White-box, coverage-based approaches fare poorly, (2) DeepGD outperforms existing black-box test selection approaches in terms of fault detection, and (3) DeepGD also leads to better guidance for DNN model retraining when using selected inputs to augment the training set.

“…Similar to code coverage in traditional software systems [14], researchers have proposed several neuron coverage metrics in the context of DNN test selection to prioritize the selection of test inputs with high coverage scores [6][7][8][9][10]. Studies have shown the effectiveness of these approaches for distinguishing adversarial inputs but failed to demonstrate a positive correlation between coverage and DNN mispredictions [15][16][17][18]. Furthermore, for some coverage metrics, reaching maximum coverage can be easily achieved by selecting a few test inputs [15,19], thus putting into question the usefulness of such white-box DNN test selection approaches.…”

Section: Introductionmentioning

confidence: 99%

“…Moreover, the use of white-box DNN test selection approaches is hindered by the requirement for access to the internal structure of the DNN model or the training dataset. This can be a significant limitation, particularly when the model is proprietary or provided by a third party [16].…”

Section: Introductionmentioning

confidence: 99%

“…Another category of black-box metrics that has been successfully applied in testing traditional software systems is diversity [21][22][23]. However, only a few studies have investigated its effectiveness in testing DNN models [11,16,24]. Zhao et al [24] studied stateof-the-art (SOTA) test selection approaches for DNNs and found that they do not guarantee the selection of diverse test input sets.…”

Section: Introductionmentioning

confidence: 99%

“…Zhao et al [24] studied stateof-the-art (SOTA) test selection approaches for DNNs and found that they do not guarantee the selection of diverse test input sets. Also, Aghababaeyan et al [16] compared the fault-revealing power of several diversity metrics with coverage criteria. They found that geometric diversity (GD) [25] outperforms coverage criteria in terms of fault-revealing power and computational time.…”

Section: Introductionmentioning

confidence: 99%

“…Similar to failures in traditional software systems, many mispredictions result from the same faults in the DNN model and are therefore redundant [16,27]. This is why testing studies typically focus on faults, not failures [28,29].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

DeepGD: A Multi-Objective Black-Box Test Selection Approach for Deep Neural Networks

Aghababaeyan¹,

et al. 2023

Preprint

View full text Add to dashboard Cite

Deep neural networks (DNNs) are widely used in various application domains such as image processing, speech recognition, and natural language processing. However, testing DNN models may be challenging due to the complexity and size of their input domain. Particularly, testing DNN models often requires generating or exploring large unlabeled datasets. In practice, DNN test oracles, which identify the correct outputs for inputs, often require expensive manual effort to label test data, possibly involving multiple experts to ensure labeling correctness.In this paper, we propose DeepGD, a black-box multi-objective test selection approach for DNN models. It reduces the cost of labeling by prioritizing the selection of test inputs with high faultrevealing power from large unlabeled datasets. DeepGD not only selects test inputs with high uncertainty scores to trigger as many mispredicted inputs as possible but also maximizes the probability of revealing distinct faults in the DNN model by selecting diverse mispredicted inputs.The experimental results conducted on four widely used datasets and five DNN models show that in terms of fault-revealing ability:(1) White-box, coverage-based approaches fare poorly, (2) DeepGD outperforms existing black-box test selection approaches in terms of fault detection, and (3) DeepGD also leads to better guidance for DNN model retraining when using selected inputs to augment the training set.

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Product

Browser Extension Assistant by scite Citation Statement Search Reference Check Visualizations Dashboards Explore Journals Explore Organizations Explore Funders Embedding Badge Embedding Citation Search Pricing

Resources

Blog Help & FAQ Accessibility Statement API Terms For Universities & Governments For Researchers For Publishers For Corporate, Pharma & Enterprise Author Marketing Become an Affiliate Get an organization trial or quote scite Data & Services

About

News & Press Careers Read our Paper Coverage

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Copyright © 2024 scite LLC. All rights reserved.

Made with 💙 for researchers

Part of the Research Solutions Family.