Summary
In recent years, several tools have been developed to automatically select test inputs from the code of the system under test. However, each of these tools has different advantages, and there is a little detailed feedback available on the actual capabilities of the various tools. To evaluate test input generators, this paper collects a set of programming language concepts that should be handled by the tools and maps these core concepts and challenging features like handling the environment or multi‐threading to 363 code snippets, respectively. These snippets would serve as inputs for the tools. Next, the paper presents SETTE, an automated framework to execute and evaluate these snippets. Using SETTE, multiple experiments were performed on five Java and one .NET‐based tools using symbolic execution, search‐based, and random techniques. The test suites' coverage, size, generation time, and mutation score were compared. The results highlight the strengths and weaknesses of each tool and approach and identify hard code parts that are difficult to tackle for most of the tools. We hope that this research could serve as actionable feedback to tool developers and help practitioners assess the readiness of test input generation.