String-related defects are among the most prevalent and costly in modern software development. For example, in terms of frequency, cross-site scripting vulnerabilities have long surpassed traditional exploits like buffer overruns. The state of this problem is particularly disconcerting because it does not just affect legacy code: developing web applications today -even when adhering to best practices and using modern library support -remains error-prone.A number of program analysis approaches aim to prevent or mitigate string-related defects; examples include static bug detectors and automated testcase generators. Traditionally, this work has relied on built-in algorithms to reason about string-manipulating code. This arrangement is suboptimal for two reasons: first, it forces researchers to re-invent the wheel for each new analysis; and second, it does not encourage the independent improvement of domain-specific algorithms for handling strings.In this dissertation, we present research on specialized decision algorithms for string constraints.Our high-level approach is to provide a constraint solving interface; a client analysis can use that interface to reason about strings in the same way it might use a SAT solver to reason about binary state. To this end, we identify a set of string constraints that captures common programming language constructs, and permits efficient solving algorithms. We provide a core solving algorithm together with a machine-checkable proof of its correctness.Next, we focus on performance. We evaluate a variety of datastructures and algorithms in a controlled setting to inform our choice of each. Our final approach is based on two insights: (1) string constraints can be cast as an explicit search problem, and (2) to solve these constraints, we can i instantiate the search space lazily through incremental refinement. These insights lead to substantial performance gains relative to competing approaches; our experimental results show our prototype to be several of magnitude faster across several published benchmarks.
iv
Chapter 1 IntroductionThis dissertation focuses on a common source of software defects: string manipulation. Stringrelated defects are among the most prevalent and costly in modern software development. Reasoning about strings is a key aspect in many types of program analysis work, including static bug finding [1, 5, 6,7] and automated testing [8,9,10,11,12]. Until recently, this work has relied on ad hoc algorithms to formally reason about the values that string variables may take at runtime.That situation is suboptimal for two reasons: first, it forces researchers to re-invent the wheel for each new tool; and second, it does not encourage the independent improvement of domain-specific reasoning for strings.In this dissertation, we focus on the development of algorithms that enable formal reasoning about string operations; we refer to these algorithms as string decision procedures. Informally, a decision procedure is an algorithm that, given an input formula, answe...