Developments in the automation of test data generation have greatly improved efficiency of the software testing process, but the so-called oracle problem (deciding the pass or fail outcome of a test execution) is still primarily an expensive and error-prone manual activity. We present an approach to automatically detect passing and failing executions using cluster-based anomaly detection on dynamic execution data based on firstly, just a system's input/output pairs and secondly, amalgamations of input/output pairs and execution traces. The key hypothesis is that failures will group into small clusters, whereas passing executions will group into larger ones. Evaluation on three systems with a range of faults demonstrates this hypothesis to be valid-in many cases small clusters were composed of at least 60 % failures (and often more). Concentrating the failures in these small clusters substantially reduces the numbers of outputs that a developer would need to manually examine following a test run and illustrates that the approach has the potential to improve the effectiveness and efficiency of the testing process.
A key component of software testing is deciding whether a test case has passed or failed: an expensive and errorprone manual activity. We present an approach to automatically classify passing and failing executions using semi-supervised learning on dynamic execution data (test inputs/outputs and execution traces). A small proportion of the test data is labelled as passing or failing and used to build a classifier which is then capable of labelling the remaining outputs (classify them as passing or failing tests). A range of learning algorithms are investigated using several faulty versions of three systems along with varying types of data (inputs/outputs alone, or in combination with execution traces) and different labelling strategies (both failing and passing tests, and passing tests alone). The results show that in many cases labelling just a small proportion of the test cases as low as 10% is sufficient to build a classifier that is able to correctly categorise the large majority of the remaining test cases. This has important practical potential: when checking the test results from a system a developer need only examine a small proportion of these and use this information to train a learning algorithm to automatically classify the remainder.
In recent years, software testing research has produced notable advances in the area of automated test data generation, but the corresponding oracle problem (a mechanism for determine the (in)correctness of an executed test case) is still a major problem. In this paper, we present a preliminary study which investigates the application of anomaly detection techniques (based on clustering) to automatically build an oracle using a system's input/output pairs, based on the hypothesis that failures will tend to group into small clusters. The fault detection capability of the approach is evaluated on two systems and the findings reveal that failing outputs do indeed tend to congregate in small clusters, suggesting that the approach is feasible and has the potential to reduce by an order of magnitude the numbers of outputs that would need to be manually examined following a test run.
The biggest obstacle of automated software testing is the construction of test oracles. Today, it is possible to generate enormous amount of test cases for an arbitrary system that reach a remarkably high level of coverage, but the effectiveness of test cases is limited by the availability of test oracles that can distinguish failing executions. Previous work by the authors has explored the use of unsupervised and semi-supervised learning techniques to develop test oracles so that the correctness of software outputs and behaviours on new test cases can be predicated [1], [2], [3], and experimental results demonstrate the promise of this approach. In this paper, we present an evaluation study for test oracles based on machine-learning approaches via dynamic execution data (firstly, input/output pairs and secondly, amalgamations of input/output pairs and execution traces) by comparing their effectiveness with existing techniques from the specification mining domain (the data invariant detector Daikon [4]). The two approaches are evaluated on a range of mid-sized systems and compared in terms of their fault detection ability and false positive rate. The empirical study also discuss the major limitations and the most important properties related to the application of machine learning techniques as test oracles in practice. The study also gives a road map for further research direction in order to tackle some of discussed limitations such as accuracy and scalability. The results show that in most cases semi-supervised learning techniques performed far better as an automated test classifier than Daikon (especially in the case that input/output pairs were augmented with their execution traces). However, there is one system for which our strategy struggles and Daikon performed far better. Furthermore, unsupervised learning techniques performed on a par when compared with Daikon in several cases particularly when input/output pairs were used together with execution traces.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.