Rather than tediously writing unit tests manually, tools can be used to generate them automatically-sometimes even resulting in higher code coverage than manual testing. But how good are these tests at actually finding faults? To answer this question, we applied three state-of-the-art unit test generation tools for Java (Randoop, EvoSuite, and Agitar) to the 357 real faults in the Defects4J dataset and investigated how well the generated test suites perform at detecting these faults. Although the automatically generated test suites detected 55.7% of the faults overall, only 19.9% of all the individual test suites detected a fault. By studying the effectiveness and problems of the individual tools and the tests they generate, we derive insights to support the development of automated unit test generators that achieve a higher fault detection rate. These insights include 1) improving the obtained code coverage so that faulty statements are executed in the first instance, 2) improving the propagation of faulty program states to an observable output, coupled with the generation of more sensitive assertions, and 3) improving the simulation of the execution environment to detect faults that are dependent on external factors such as date and time.
Achieving high structural coverage is an important aim in software testing. Several search-based techniques have proved successful at automatically generating tests that achieve high coverage. However, despite the well-established arguments behind using evolutionary search algorithms (e.g., genetic algorithms) in preference to random search, it remains an open question whether the benefits can actually be observed in practice when generating unit test suites for object-oriented classes. In this paper, we report an empirical study on the effects of using a genetic algorithm (GA) to generate test suites over generating test suites incrementally with random search, by applying the EvoSuite unit test suite generator to 1,000 classes randomly selected from the SF110 corpus of open source projects. Surprisingly, the results show little difference between the coverage achieved by test suites generated with evolutionary search compared to those generated using random search. A detailed analysis reveals that the genetic algorithm covers more branches of the type where standard fitness functions provide guidance. In practice, however, we observed that the vast majority of branches in the analyzed projects provide no such guidance.
Generating unit tests automatically saves time over writing tests manually and can lead to higher code coverage. However, automatically generated tests are usually not based on realistic scenarios, and are therefore generally considered to be less readable. This places a question mark over their practical value: Every time a test fails, a developer has to decide whether this failure has revealed a regression fault in the program under test, or whether the test itself needs to be updated. Does the fact that automatically generated tests are harder to read outweigh the time-savings gained by their automated generation, and render them more of a hindrance than a help for software maintenance? In order to answer this question, we performed an empirical study in which participants were presented with an automatically generated or manually written failing test, and were asked to identify and fix the cause of the failure. Our experiment and two replications resulted in a total of 150 data points based on 75 participants. Whilst maintenance activities take longer when working with automatically generated tests, we found developers to be equally effective with manually written and automatically generated tests. This has implications on how automated test generation is best used in practice, and it indicates a need for research into the generation of more realistic tests.
Summary An important aim in software testing is constructing a test suite with high structural code coverage, that is, ensuring that most if not all of the code under test have been executed by the test cases comprising the test suite. Several search‐based techniques have proved successful at automatically generating tests that achieve high coverage. However, despite the well‐established arguments behind using evolutionary search algorithms (eg, genetic algorithms) in preference to random search, it remains an open question whether the benefits can actually be observed in practice when generating unit test suites for object‐oriented classes. In this paper, we report an empirical study on the effects of using evolutionary algorithms (including a genetic algorithm and chemical reaction optimization) to generate test suites, compared with generating test suites incrementally with random search. We apply the EVOSUITE unit test suite generator to 1000 classes randomly selected from the SF110 corpus of open‐source projects. Surprisingly, the results show that the difference is much smaller than one might expect: While evolutionary search covers more branches of the type where standard fitness functions provide guidance, we observed that, in practice, the vast majority of branches do not provide any guidance to the search. These results suggest that, although evolutionary algorithms are more effective at covering complex branches, a random search may suffice to achieve high coverage of most object‐oriented classes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.