DeepCruiser: Automated Guided Testing for Stateful Deep Learning Systems

Du, Xiaoning; Xie, Xiaofei; Li, Yi; Ma, Lei; Zhao, Jianjun; Liu, Yang

doi:10.48550/arxiv.1812.05339

Cited by 8 publications

(10 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Testing of machine learning algorithms, especially for sophisticated algorithms like deep learning systems, requires specific testing approaches. To that end, recently several adequacy criteria for deep learning systems like neuron coverage [24], surprise adequacy [19] or criteria derived from modeling deep learning systems as abstract state transition systems [8] were defined. However, testing algorithms is not sufficient as the integration of algorithms into systems can be complex, leading to problems and defects being injected along the way.…”

Section: Discussionmentioning

confidence: 99%

On Testing Data-Intensive Software Systems

Felderer

Russo

2019

Security and Quality in Cyber-Physical Systems Engineering

View full text Add to dashboard Cite

Today's software systems like cyber-physical production systems or big data systems have to process large volumes and diverse types of data which heavily influences the quality of these so-called data-intensive systems. However, traditional software testing approaches rather focus on functional behavior than on data aspects. Therefore, the role of data in testing has to be rethought and specific testing approaches for data-intensive software systems are required. Thus, the aim of this chapter is to contribute to this area by (1) providing basic terminology and background on data-intensive software systems and their testing, and (2) presenting the state of the research and the hot topics in the area. Finally, the directions of research and the new frontiers on testing data-intensive software systems are discussed.

show abstract

Section: Discussionmentioning

confidence: 99%

On Testing Data-Intensive Software Systems

Felderer

Russo

2019

Security and Quality in Cyber-Physical Systems Engineering

View full text Add to dashboard Cite

show abstract

“…To test audio-based deep learning systems, Du et al [78] designed a set of transformations tailored to audio inputs considering background noise and volume variation. They first abstracted and extracted a probabilistic transition model from an RNN.…”

Section: Domain-specific Test Input Synthesismentioning

confidence: 99%

“…We will introduce more related work about using domain-specific metamorphic relations of testing autonomous driving, Differentiable Neural Computer (DNC) [93], machine translation systems [123], [124], biological cell classification [79], and audio-based deep learning systems [78] in Section 8.…”

Section: Metamorphic Relations As Test Oraclesmentioning

confidence: 99%

Machine Learning Testing: Survey, Landscapes and Horizons

Zhang

Harman

Ма

et al. 2022

IIEEE Trans. Software Eng.

Self Cite

534

318

View full text Add to dashboard Cite

This paper provides a comprehensive survey of Machine Learning Testing (ML testing) research. It covers 128 papers on testing properties (e.g., correctness, robustness, and fairness), testing components (e.g., the data, learning program, and framework), testing workflow (e.g., test generation and test evaluation), and application scenarios (e.g., autonomous driving, machine translation). The paper also analyses trends concerning datasets, research trends, and research focus, concluding with research challenges and promising research directions in ML testing. Index Terms-machine learning, software testing, deep neural network, ! • Jie M. Zhang and Mark Harman are with CREST, University College London, United Kingdom. Mark Harman is also with Facebook London.

show abstract

“…In addition to CV models, natural language processing (NLP) models and their critical applications in machine translation have also been tested [19,41,42,76]. We also notice recent works on testing RNNs and RL models [28,29,40,45,80]. MDPFuzzer tests FNNs, RL, IL, and MARL models for solving MDPs, where existing efforts in testing FNNs and RNNs are not applicable, as will be discussed in Sec.…”

Section: Related Workmentioning

confidence: 99%

MDPFuzz: Testing Models Solving Markov Decision Processes

Pang¹,

Yuan²,

Wang³

2021

Preprint

View full text Add to dashboard Cite

The Markov decision process (MDP) provides a mathematical framework for modeling sequential decision-making problems, many of which are crucial to security and safety, such as autonomous driving and robot control. The rapid development of artificial intelligence research has created efficient methods for solving MDPs, such as deep neural networks (DNNs), reinforcement learning (RL), and imitation learning (IL). However, these popular models for solving MDPs are neither thoroughly tested nor rigorously reliable.We present MDPFuzzer, the first blackbox fuzz testing framework for models solving MDPs. MDPFuzzer forms testing oracles by checking whether the target model enters abnormal and dangerous states. During fuzzing, MDPFuzzer decides which mutated state to retain by measuring if it can reduce cumulative rewards or form a new state sequence. We design efficient techniques to quantify the "freshness" of a state sequence using Gaussian mixture models (GMMs) and dynamic expectation-maximization (DynEM). We also prioritize states with high potential of revealing crashes by estimating the local sensitivity of target models over states.MDPFuzzer is evaluated on five state-of-the-art models for solving MDPs, including supervised DNN, RL, IL, and multi-agent RL. Our evaluation includes scenarios of autonomous driving, aircraft collision avoidance, and two games that are often used to benchmark RL. During a 12-hour run, we find over 80 crash-triggering state sequences on each model. We show inspiring findings that crash-triggering states, though look normal, induce distinct neuron activation patterns compared with normal states. We further develop an abnormal behavior detector to harden all the evaluated models and repair them with the findings of MDPFuzzer to significantly enhance their robustness without sacrificing accuracy.

show abstract

DeepCruiser: Automated Guided Testing for Stateful Deep Learning Systems

Cited by 8 publications

References 28 publications

On Testing Data-Intensive Software Systems

On Testing Data-Intensive Software Systems

Machine Learning Testing: Survey, Landscapes and Horizons

MDPFuzz: Testing Models Solving Markov Decision Processes

Contact Info

Product

Resources

About