Summary
Measuring the distance between two program executions is a fundamental problem in dynamic analysis of software and useful in many test generation and debugging algorithms. This paper proposes a metric for measuring distance between executions and specializes it to an important application: determining similarity of failing test cases for the purpose of automated fault identification and localization in debugging based on automatically generated compiler tests. The metric is based on a causal concept of distance where executions are similar to the degree that changes in the program itself, introduced by mutation, cause similar changes in the correctness of the executions. Specifically, if two failing test cases for a compiler can be made to pass by applying the same mutant, those two tests are more likely to be due to the same fault. We evaluate our metric using more than 50 faults and 2,800 test cases for two widely used real‐world compilers and demonstrate improvements over state‐of‐the‐art methods for fault identification and localization. A simple operator selection approach to reducing the number of mutants can reduce the cost of our approach by 70%, while producing a gain in fault identification accuracy. We additionally show that our approach, although devised for compilers, is applicable as a conservative fault localization algorithm for other types of programs and can help triage certain types of crashes found in fuzzing non‐compiler programs more effectively than a state‐of‐the‐art technique.