Validating SMT solvers via semantic fusion

Winterer, Dominik; Zhang, Chengyu; Su, Zhendong

doi:10.1145/3385412.3385985

Cited by 55 publications

(24 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[42] to test the performance regression bugs in DBMSs . Furthermore, differential testing is powerful and applied to different domains such as testing SMT solvers [62,63],…”

Section: Related Work 61 Testing Cpu Emulatorsmentioning

confidence: 99%

EXAMINER: automatically locating inconsistent instructions between real devices and CPU emulators for ARM

Jiang

Zhou

et al. 2022

Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

View full text Add to dashboard Cite

Emulators are widely used to build dynamic analysis frameworks due to its fine-grained tracing capability, full system monitoring functionality, and scalability of running on different operating systems and architectures. However, whether emulators are consistent with real devices is unknown. To understand this problem, we aim to automatically locate inconsistent instructions, which behave differently between emulators and real devices.We target the ARM architecture, which provides machine-readable specifications. Based on the specification, we propose a sufficient test case generator by designing and implementing the first symbolic execution engine for the ARM architecture specification language (ASL). We generate 2,774,649 representative instruction streams and conduct differential testing between four ARM real devices in different architecture versions (i.e., ARMv5, ARMv6, ARMv7, and ARMv8) and three state-of-the-art emulators (i.e., QEMU, Unicorn, and Angr). We locate a huge number of inconsistent instruction streams (171,858 for QEMU, 223,264 for unicorn, and 120,169 for Angr). We find that undefined implementation in ARM manual and bugs of emulators are the major causes of inconsistencies. Furthermore, we discover 12 bugs, which influence commonly used instructions (e.g., BLX). With the inconsistent instructions, we build three security applications and demonstrate the capability of these instructions on detecting emulators, anti-emulation, and anti-fuzzing.

show abstract

“…[42] to test the performance regression bugs in DBMSs . Furthermore, differential testing is powerful and applied to different domains such as testing SMT solvers [62,63],…”

Section: Related Work 61 Testing Cpu Emulatorsmentioning

confidence: 99%

EXAMINER: automatically locating inconsistent instructions between real devices and CPU emulators for ARM

Jiang

Zhou

et al. 2022

Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

View full text Add to dashboard Cite

show abstract

“…We encode a theory in Z3 to detect equivalent mutants, in Step 4. We use the latest version of Z3 after fixing the bugs found by Winterer et al [Winterer et al 2020]. Listing 2 specifies how to prove a theorem using the Z3 Python API.…”

Section: Techniquementioning

confidence: 99%

A Lightweight Technique to Identify Equivalent Mutants

Souza¹,

Gheyi²

2020

Anais Estendidos Do XI Congresso Brasileiro De Software: Teoria E Prática (CBSoft 2020)

View full text Add to dashboard Cite

Mutation analysis is a popular but costly approach to assess the quality of test suites. Equivalent mutants are useless and contribute to increase costs. We propose a lightweight technique to identify equivalent mutants by proving equivalences with Z3 in the context of weak mutation testing. To evaluate our approach, we apply our technique for 40 mutation targets (mutations of an expression or statement) and automatically identify 13 equivalent mutations for seven mutation targets. We manually confirm that the equivalent mutants detected by our technique are indeed equivalent. Moreover, we evaluate our approach in the context of strong mutation testing against mutants generated by MuJava for 5 projects. Our technique detects all equivalent mutants detected by TCE. The results of our technique can be useful to improve mutation testing tools by avoiding the application of 13 mutations for 7 mutation targets.

show abstract

“…In practice, tool developers focus on testing or verifying their use of the symbolic evaluator , and trust the evaluator and the solver to be correct. The trust in solvers is based on decades of community investment in their testing [Winterer et al 2020], validation [Cruz-Filipe et al 2017], and verification [Blanchette et al 2017]. But the trust in reusable evaluators rests on a weaker foundation of ad-hoc testing and manual inspection.…”

Section: Introductionmentioning

confidence: 99%

A formal foundation for symbolic evaluation with merging

Porncharoenwase

Nelson

Wang

et al. 2022

Proc. ACM Program. Lang.

View full text Add to dashboard Cite

Reusable symbolic evaluators are a key building block of solver-aided verification and synthesis tools. A reusable evaluator reduces the semantics of all paths in a program to logical constraints, and a client tool uses these constraints to formulate a satisfiability query that is discharged with SAT or SMT solvers. The correctness of the evaluator is critical to the soundness of the tool and the domain properties it aims to guarantee. Yet so far, the trust in these evaluators has been based on an ad-hoc foundation of testing and manual reasoning. This paper presents the first formal framework for reasoning about the behavior of reusable symbolic evaluators. We develop a new symbolic semantics for these evaluators that incorporates state merging. Symbolic evaluators use state merging to avoid path explosion and generate compact encodings. To accommodate a wide range of implementations, our semantics is parameterized by a symbolic factory, which abstracts away the details of merging and creation of symbolic values. The semantics targets a rich language that extends Core Scheme with assumptions and assertions, and thus supports branching, loops, and (first-class) procedures. The semantics is designed to support reusability, by guaranteeing two key properties: legality of the generated symbolic states, and the reducibility of symbolic evaluation to concrete evaluation. Legality makes it simpler for client tools to formulate queries, and reducibility enables testing of client tools on concrete inputs. We use the Lean theorem prover to mechanize our symbolic semantics, prove that it is sound and complete with respect to the concrete semantics, and prove that it guarantees legality and reducibility. To demonstrate the generality of our semantics, we develop Leanette, a reference evaluator written in Lean, and Rosette 4, an optimized evaluator written in Racket. We prove Leanette correct with respect to the semantics, and validate Rosette 4 against Leanette via solver-aided differential testing. To demonstrate the practicality of our approach, we port 16 published verification and synthesis tools from Rosette 3 to Rosette 4. Rosette 3 is an existing reusable evaluator that implements the classic merging semantics, adopted from bounded model checking. Rosette 4 replaces the semantic core of Rosette 3 but keeps its optimized symbolic factory. Our results show that Rosette 4 matches the performance of Rosette 3 across a wide range of benchmarks, while providing a cleaner interface that simplifies the implementation of client tools.

show abstract

Validating SMT solvers via semantic fusion

Cited by 55 publications

References 21 publications

EXAMINER: automatically locating inconsistent instructions between real devices and CPU emulators for ARM

EXAMINER: automatically locating inconsistent instructions between real devices and CPU emulators for ARM

A Lightweight Technique to Identify Equivalent Mutants

A formal foundation for symbolic evaluation with merging

Contact Info

Product

Resources

About