Studying OpenCL-based Number Theoretic Transform for heterogeneous platforms

Haleplidis, Evangelos; Tsakoulis, Thanasis; El-Kady, Alexander; Dimopoulos, Charis; Koufopavlou, O.; Fournaris, Apostolos P.

doi:10.1109/dsd53832.2021.00058

Cited by 5 publications

(1 citation statement)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…NTT (Number Theoretic Transformation) and iNTT (inverse Number Theoretic Transformation) are used to accelerate polynomial multiplication. Many applications, such as fully homomorphic encryption [19]- [21], post-quantum cryptography [22]- [24], tend to use NTT and iNTT (inverse Number Theoretic Transformation) to reduce the time complexity of polynomial multiplication. Therefore, many domain-specific NTT hardware and software-hardware collaborative acceleration schemes have emerged [25], [26] in recent years.…”

Section: Introductionmentioning

confidence: 99%

Hardware Acceleration of Number Theoretic Transform for zk-SNARK

Zhao

Ding

Wang

et al. 2022

Preprint

View full text Add to dashboard Cite

Zk-SNARK unleashes the great potential of ZKP (zero-knowledge proof) in the blockchain, distributed storage, etc. However, the proof-generation of zk-SNARK is excessively time intensive, making it a challenge to deploy a high-performance zk-SNARK in most real applications. As a result, NTT (Number Theoretic Transform), one of the most time-consuming parts in proof-generation, needs to be accelerated significantly. To address this issue, we propose a novel and efficient “data reordering” technique to enable a highly pipelined architecture, on which an FPGA-based hardware accelerator is designed to support the large-bitwidth and large-scale NTT tasks in zk-SNARK. Our architecture achieves a two-level pipeline: 1) the top-level pipeline is achieved among smaller NTT sub-tasks, which are decomposed from a large-scale NTT task; 2) the bottom-level pipeline is achieved in each sub-task, among butterfly operations with different step sizes. This architecture can effectively reduce the data dependency and memory access requirements, meanwhile, can be flexibly scaled to different scales of FPGAs. To balance computing efficiency and flexibility, the OpenCL equipped with HLS is used to implement the heterogeneous acceleration system. We prototype the accelerator on the AMD-Xilinx Alveo U50 card (UltraScale+ XCU50 FPGA). The evaluation results show that 1) our accelerator shows high scalability for different scales of FPGAs with a stable performance improvement; 2) it performs 1.95× faster than the one in PipeZK; 3) and it achieves 27.98×, 1.74× speedup and 6.9×, 6× energy efficiency improvement than AMD Ryzen 9 5900X single core and 12 cores respectively when integrated into the well-known ZKP open-source project, Bellman.

show abstract

Section: Introductionmentioning

confidence: 99%

Hardware Acceleration of Number Theoretic Transform for zk-SNARK

Zhao

Ding

Wang

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

Hardware acceleration of number theoretic transform for zk‐SNARK

Zhao

Ding

Wang

et al. 2023

Engineering Reports

View full text Add to dashboard Cite

Zk‐SNARK unleashes the great potential of ZKP (zero‐knowledge proof) in the blockchain, distributed storage, and so forth. However, the proof‐generation of zk‐SNARK is excessively time intensive, making it a challenge to deploy a high‐performance zk‐SNARK in most real applications. As a result, NTT (Number Theoretic Transform), one of the most time‐consuming parts in proof‐generation, needs to be accelerated significantly. To address this issue, we propose a novel and efficient “data reordering" technique to enable a highly pipelined architecture, on which an FPGA‐based hardware accelerator is designed to support the large‐bitwidth and large‐scale NTT tasks in zk‐SNARK. This two‐level pipelined architecture can effectively reduce the data dependency and memory access requirements, meanwhile, can be flexibly scaled to different scales of FPGAs. To balance computing efficiency and flexibility, the OpenCL equipped with HLS is used to implement the heterogeneous acceleration system. We prototype the accelerator on the AMD‐Xilinx Alveo U50 card (UltraScale+ XCU50 FPGA). The evaluation results show that (1) our accelerator shows high scalability for different scales of FPGAs with a stable performance improvement; (2) it performs 1.95 faster than the one in PipeZK; (3) and it achieves 27.98 , 1.74 speedup and 6.9 , 6 energy efficiency improvement than AMD Ryzen 9 5900X single core and 12 cores respectively when integrated into the well‐known ZKP open‐source project, Bellman.

show abstract