Abstract. Spreadsheets are by far the most prominent example of enduser programs of ample size and substantial structural complexity. In addition, spreadsheets are usually not tested very rigorously and thus comprise faults. Locating faults is a hard task due to the size and the structure, which is usually not directly visible to the user, i.e., the functions are hidden behind the cells and only the computed values are presented. Hence, there is a strong need for debugging support. In this paper, we adapt three program-debugging approaches that have been designed for more traditional procedural or object-oriented programming languages. These techniques are Spectrum-based Fault Localization, Spectrum-Enhanced Dynamic Slicing, and Constraint-based Debugging. Beside the theoretical foundations, we present a more sophisticated empirical evaluation including a comparison of these approaches. The empirical evaluation shows that Sfl (Spectrum-based Fault Localization) and Sendys (Spectrum ENhanced Dynamic Slicing) are the most promising techniques.
Spreadsheets are by far the most prominent example of end-user programs of ample size and substantial structural complexity. They are usually not thoroughly tested so they often contain faults. Debugging spreadsheets is a hard task due to the size and structure, which is usually not directly visible to the user, i.e., the functions are hidden and only the computed values are presented. A way to locate faulty cells in spreadsheets is by adapting software debugging approaches for traditional procedural or object-oriented programming languages. One of such approaches is spectrum-based fault localization (Sfl). In this paper, we study the impact of different similarity coefficients on the accuracy of Sfl applied to the spreadsheet domain. Our empirical evaluation shows that three of the 42 studied coefficients (Ochiai, Jaccard and Sorensen-Dice) require less effort by the user while inspecting the diagnostic report, and can also be used interchangeably without a loss of accuracy. In addition, we illustrate the influence of the number of correct and incorrect output cells on the diagnostic report.of the workers (cells F2:F3) and the total working hours (cell D4). Figure 1c shows the formula view of the faulty spreadsheet from Fig. 1b. In this faulty spreadsheet, the computation of the total hours for the worker "Green" (cell D2) is faulty because the programmer of the spreadsheet unintentionally set a wrong area for the SUM formula. This happens for example when a programmer adds a new week but forgets to adapt some calculations. Because of this fault, the wage of the worker "Green" (cell F2) and the total hours (cell D4) are erroneous. In the following sections, we show how to use Sfl to pinpoint the faulty cell D2.The remainder of the paper is organized as follows: Section 2 deals with the related work. In addition, existing spreadsheet debugging and testing techniques are discussed. Section 3 deals with the syntax and semantics of spreadsheets. Furthermore, the spreadsheet debugging problem is defined. Section 4 explains the changes that have to be made in order to use Sfl for the debugging of spreadsheets. In addition, the similarity coefficients are discussed. Section 5 deals with the setup and the results of the empirical evaluation. Finally, Section 6 concludes this paper and presents ideas for future empirical evaluations.
Related workBasically, our paper is based on the work of Lo et al. (2010), Lucia et al. (2013) which compares the fault localization capabilities of 42 similarity coefficients for programs written in C. In contrast to them, we focus on spreadsheets. To the best of our knowledge, there has not been published any paper that compares similarity coefficients for
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.