Yanju Chen scite author profile

Martins

2019

There has been a significant interest in applying programming-byexample to automate repetitive and tedious tasks. However, due to the incomplete nature of input-output examples, a synthesizer may generate programs that pass the examples but do not match the user intent. In this paper, we propose Mars, a novel synthesis framework that takes as input a multi-layer specification composed by inputoutput examples, textual description, and partial code snippets that capture the user intent. To accurately capture the user intent from the noisy and ambiguous description, we propose a hybrid model that combines the power of an LSTM-based sequence-tosequence model with the apriori algorithm for mining association rules through unsupervised learning. We reduce the problem of solving a multi-layer specification synthesis to a Max-SMT problem, where hard constraints encode well-typed concrete programs and soft constraints encode the user intent learned by the hybrid model. We instantiate our hybrid model to the data wrangling domain and compare its performance against Morpheus, a state-of-the-art synthesizer for data wrangling tasks. Our experiments demonstrate that our approach outperforms Morpheus in terms of running time and solved benchmarks. For challenging benchmarks, our approach can suggest candidates with rankings that are an order of magnitude better than Morpheus which leads to running times that are 15x faster than Morpheus. CCS CONCEPTS • Software and its engineering → Programming by example; Automatic programming.

Proc. ACM Program. Lang.

Automated Detection of Under-Constrained Circuits in Zero-Knowledge Proofs

Pailoor¹,

Chen²,

Wang

et al. 2023

As zero-knowledge proofs gain increasing adoption, the cryptography community has designed domain-specific languages (DSLs) that facilitate the construction of zero-knowledge proofs (ZKPs). Many of these DSLs, such as Circom, facilitate the construction of arithmetic circuits, which are essentially polynomial equations over a finite field. In particular, given a program in a zero-knowledge proof DSL, the compiler automatically produces the corresponding arithmetic circuit. However, a common and serious problem is that the generated circuit may be underconstrained, either due to a bug in the program or a bug in the compiler itself. Underconstrained circuits admit multiple witnesses for a given input, so a malicious party can generate bogus witnesses, thereby causing the verifier to accept a proof that it should not. Because of the increasing prevalence of such arithmetic circuits in blockchain applications, several million dollars worth of cryptocurrency have been stolen due to underconstrained arithmetic circuits. Motivated by this problem, we propose a new technique for finding ZKP bugs caused by underconstrained polynomial equations over finite fields. Our method performs semantic reasoning over the finite field equations generated by the compiler to prove whether or not each signal is uniquely determined by the input. Our proposed approach combines SMT solving with lightweight uniqueness inference to effectively reason about underconstrained circuits. We have implemented our proposed approach in a tool called QED 2 and evaluate it on 163 Circom circuits. Our evaluation shows that QED 2 can successfully solve 70% of these benchmarks, meaning that it either verifies the uniqueness of the output signals or finds a pair of witnesses that demonstrate non-uniqueness of the circuit. Furthermore, QED 2 has found 8 previously unknown vulnerabilities in widely-used circuits.

Synthesis-powered optimization of smart contracts via data type refactoring

Proc. ACM Program. Lang.

Wang

Goyal

et al. 2022

Since executing a smart contract on the Ethereum blockchain costs money (measured in gas ), smart contract developers spend significant effort in reducing gas usage. In this paper, we propose a new technique for reducing the gas usage of smart contracts by changing the underlying data layout. Given a smart contract P and a type-level transformation, our method automatically synthesizes a new contract P ′ that is functionally equivalent to P . Our approach provides a convenient DSL for expressing data type refactorings and employs program synthesis to generate the new version of the contract. We have implemented our approach in a tool called Solidare and demonstrate its capabilities on real-world smart contracts from Etherscan and GasStation. In particular, we show that our approach is effective at automating the desired data layout transformation and that it is useful for reducing gas usage of smart contracts that use rich data structures.

Learning Contract Invariants Using Reinforcement Learning

Liu

Tan³

et al. 2022

Tree traversal synthesis using domain-specific symbolic compilation

Liu

et al. 2022

Efficient computation on tree data structures is important in compilers, numeric computations, and web browser layout engines. Efficiency is achieved by statically scheduling the computation into a small number of tree traversals and by performing the traversals in parallel when possible. Manual design of such traversals leads to bugs, as observed in web browsers. Automatic schedulers avoid these bugs but they currently cannot explore a space of legal traversals, which prevents exploring the trade-offs between parallelism and minimizing the number of traversals.We describe Hecate, a synthesizer of tree traversals that can produce both serial and parallel traversals. A key feature is that the synthesizer is extensible by the programmer who can define a template for new kinds of traversals. Hecate is constructed as a solver-aided domain-specific language, meaning that the synthesizer is generated automatically by translating the tree traversal DSL to an SMT solver that synthesizes the traversals. We improve on the general-purpose solver-aided architecture with a schedulingspecific symbolic evaluation that maintains the engineering advantages solver-aided design but generates efficient ILP encoding that is much more efficient to solve than SMT constraints.On the set of Grafter problems, Hecate synthesizes traversals that trade off traversal fusion to exploit parallelism. Additionally, Hecate allows defining a tree data structure with an arbitrary number of children. Together, parallelism and data structure improvements accelerate the computation 2× on a tree rendering problem. Finally, Hecate's domain-specific symbolic compilation accelerates synthesis 3× compared to the general-purpose compilation to an SMT solver; when scheduling a CSS engine traversal, this ILP-based synthesis executes orders of magnitude faster. CCS CONCEPTS• Software and its engineering → Automatic programming.