Strings are widely used in modern programming languages in various scenarios. For instance, strings are used to build up Structured Query Language (SQL) queries that are then executed. Malformed strings may lead to subtle bugs, as well as non-sanitized strings may raise security issues in an application. For these reasons, the application of static analysis to compute safety properties over string values at compile time is particularly appealing. In this article, we propose a generic approach for the static analysis of string values based on abstract interpretation. In particular, we design a suite of abstract semantics for strings, where each abstract domain tracks a different kind of information. We discuss the trade-off between efficiency and accuracy when using such domains to catch the properties of interest. In this way, the analysis can be tuned at different levels of precision and efficiency, and it can address specific properties. The concrete semantics of these operators is approximated in different ways by the five different abstract domains. In addition, after 30 years of practice with numerical domains, it is clear that a monolithic domain precise on any program and property (e.g., Polyhedra [14]) gives up in terms of efficiency, whereas to achieve scalability, we need specific approximations on a given property (e.g., Pentagons [17]) or class of programs (e.g., ASTRÉE [18]). With this scenario in mind, we develop several domains inside the same framework to tune the analysis at different levels of precision and efficiency with respect to the analyzed class of programs and property. Other abstractions are possible and welcomed, and we expect our framework to be generic enough to support them.This article ‡ is structured as follows. In the rest of this section, we introduce two running examples, and we recall some basic concepts of abstract interpretation. Section 2 defines the syntax of the string operators that we will consider in the rest of the article. Section 3 introduces their concrete semantics, whereas in Section 4, the five abstract domains and the core of this work are formalized and are used to analyze the running examples. In Section 5, more experimental results are presented. Finally, Section 6 discusses the related work, and Section 7 concludes.
Running examplesThroughout all the article, we will always refer to the two examples reported in Figures 1(a) and 1(b).The first Java program, prog1, is taken from [7], and it dynamically builds an SQL query by concatenating several strings. One of these concatenations applies only if a given input value, unknown at compile time, is not null. We are interested in checking if the SQL query resulting by ‡ The article is a fully revised and extended version of [19]