Mutation testing has been very actively investigated by researchers since the 1970s, and remarkable advances have been achieved in its concepts, theory, technology, and empirical evidence.While the most influential realisations have been summarised by existing literature reviews, we lack insight into how mutation testing is actually applied. Our goal is to identify and classify the main applications of mutation testing and analyse the level of replicability of empirical studies related to mutation testing. To this aim, this paper provides a systematic literature review on the application perspective of mutation testing based on a collection of 191 papers published between 1981 and 2015. In particular, we analysed in which quality assurance processes mutation testing is used, which mutation tools and which mutation operators are employed. Additionally, we also investigated how the inherent core problems of mutation testing, ie, the equivalent mutant problem and the high computational cost, are addressed during the actual usage. The results show that most studies use mutation testing as an assessment tool targeting unit tests, and many of the supporting techniques for making mutation testing applicable in practice are still underdeveloped.Based on our observations, we made 9 recommendations for future work, including an important suggestion on how to report mutation testing in testing experiments in an appropriate manner.
KEYWORDSapplication, mutation testing, systematic literature review 1 INTRODUCTION Mutation testing is defined by Jia and Harman [1] as a fault-based testing technique, which provides a testing criterion called the mutation adequacy score. This score can be used to measure the effectiveness of a test set in terms of its ability to detect faults [1]. The principle of mutation testing is to introduce syntactic changes into the original program to generate faulty versions (called mutants) according to well-defined rules (mutation operators) [2]. Mutation testing originated in the 1970s with works from Lipton [3], DeMillo et al. [4], and Hamlet [5] and has been a very active research field over the last few decades. The activeness of the field is in part evidenced by the extensive survey of more than 390 papers on mutation testing that Jia and Harman published in 2011 [1]. Jia and Harman's survey highlights the research achievements that have been made over the years, including the development of tools for a variety of languages and empirical studies performed [1]. Additionally, they highlight some of the actual and inherent problems of mutation testing, among others: (1) the high computational cost caused by generating and executing the numerous mutants and (2) the tremendous time-consuming human investigation required by the test oracle problem and equivalent mutant detection.While existing surveys (eg [1,2,6]) provide us with a great overview of the most influential realisations in research, we lack insight into how mutation testing is actually applied. Specifically, we are interested in analysing ...