Towards Rigorous Evaluation of Binary Testing and Analysis by Joshua BundtComputer security research is an ever-evolving field that aims to make technology more secure.Attackers constantly seek out vulnerabilities in systems, and defenders strive to introduce new controls to prevent these attacks. Attack research typically involves demonstrating the validity of an attack through a proof of concept. In contrast, defense research requires a higher level of rigor to substantiate that defenses are secure under various conditions and against a willful adversary. In this thesis, we examine the state of rigor in a specific area of defense research: binary testing and analysis. Binary testing and analysis encompasses the tasks and techniques required to evaluate binary code, which is the machine-readable representation of software programs, in order to understand program behavior, identify vulnerabilities, and ensure correctness and security. To assess the robustness of the current techniques and to provide a more rigorous methodology, we first examine the utility of synthetic bug generation as a solution to the scarcity of real bugs for fuzz testing evaluation. We conducted a large-scale measurement study evaluating existing synthetic bug generators with eight fuzzers on 20 software libraries and found that synthetic bugs are easier to discover than organic bugs and the most popular synthetic bug benchmark, LAVA-M, exhibits fundamental flaws that make it unsuitable to recommend for future research. Second, we propose a new workflow to enable humans to more effectively assist fuzz testing through compartment analysis. An empirical study of seven software libraries revealed that compartment analysis can significantly improve a fuzzing campaign even when conducted after a few hours of fuzzing. Finally, we consider the fragility of neural network binary disassemblers at the task of function boundary detection. When comparing traditional disassemblers to neural binary disassemblers, we found the latter to be vulnerable to adversarial attacks which allows the attacker to degrade function boundary detection. In response, we proposed an expanded set of benchmarks and adversarial techniques to provide a better evaluation of neural binary disassemblers. Throughout this dissertation, we propose and demonstrate improved methodologies for rigorously examining and assessing binary testing and analysis efficacy. v
AcknowledgementsThe PhD journey starts with someone agreeing to take a long term risk on you despite knowing little more than your resume. For taking the initial risk, I would like to thank my advisor Wil Robertson who has guided me from start to finish while providing advice, unwaverable optimism, and tremendous patience. A special thanks to Tim Leek who took on mentoring me long before I started at Northeastern and continues to provide keen insight and frank opinions that shape my research efforts. I would also like to thank the rest of the members of my committee Pete Manolios, Guevara Noubir, and Davide Balzarotti who pr...