Homomorphic encryption is a tool that enables computation on encrypted data and thus has applications in privacy-preserving cloud computing. Though conceptually amazing, implementation of homomorphic encryption is very challenging and typically software implementations on general purpose computers are extremely slow. In this paper we present our domain specific architecture in a heterogeneous Arm+FPGA platform to accelerate homomorphic computing on encrypted data. We design a custom co-processor for the computationally expensive operations of the well-known Fan-Vercauteren (FV) homomorphic encryption scheme on the FPGA, and make the Arm processor a server for executing different homomorphic applications in the cloud, using this FPGA-based co-processor. We use the most recent arithmetic and algorithmic optimization techniques and perform design-space exploration on different levels of the implementation hierarchy. In particular we apply circuit-level and block-level pipeline strategies to boost the clock frequency and increase the throughput respectively. To reduce computation latency, we use parallel processing at all levels. Starting from the highly optimized building blocks, we gradually build our multicore multi-processor architecture for computing. We implemented and tested our optimized domain specific programmable architecture on Xilinx Zynq UltraScale+ MPSoC ZCU102 Evaluation Kit. At 200 MHz FPGAclock, our implementation achieves over 13x speedup with respect to a highly optimized software implementation of the FV homomorphic encryption scheme on an Intel i5 processor running at 1.8 GHz.
Homomorphic Encryption makes privacy preserving computing possible in a third party owned cloud by enabling computation on the encrypted data of users. However, software implementations of homomorphic encryption are very slow on general purpose processors. With the emergence of 'FPGAs as a service', hardware-acceleration of computationally heavy workloads in the cloud are getting popular. In this paper we propose HEAWS, a domain-specific coprocessor architecture for accelerating homomorphic function evaluation on the encrypted data using high-performance FPGAs available in the Amazon AWS cloud. To the best of our knowledge, we are the first to report hardware acceleration of homomorphic encryption using Amazon AWS FPGAs. Utilizing the massive size of the AWS FPGAs, we design a high-performance and parallel coprocessor architecture for the FV homomorphic encryption scheme which has become popular for computing exact arithmetic on the encrypted data. We design parallel building blocks and apply pipeline processing at different levels of the implementation hierarchy, and on top of such optimizations we instantiate multiple parallel coprocessors in the FPGA to execute several homomorphic computations simultaneously. While the absolute computation time can be reduced by deploying more computational resources, efficiency of the HW/SW communication interface plays an important role in homomorphic encryption as it is computation as well as data intensive. Our implementation utilizes state of the art 512-bit XDMA feature of high bandwidth communication available in the AWS Shell to reduce the overhead of HW/SW data transfer. Moreover, we explore the design-space to identify optimal off-chip data transfer strategy for feeding the parallel coprocessors in a time-shared manner. As a result of these optimizations, our AWS-based accelerator can perform 613 homomorphic multiplications per second for a parameter set that enables homomorphic computations of depth 4. Finally, we benchmark an artificial neural network for privacy-preserving forecasting of energy consumption in a Smart Grid application and observe five times speed up.
This paper describes an FPGA cryptographic architecture which combines the elliptic curve based Ed25519 digital signature algorithm and the X25519 key establishment scheme in a single module. Cryptographically, these are high security elliptic curve cryptography algorithms with short key sizes and impressive execution times in software. Our goal is to provide a lightweight FPGA module, that enables them on resource-constrained devices, specifically for IoT applications. In addition, we aim at extensibility with customisable countermeasures against timing and differential power analysis side-channel attacks and fault-injection attacks. For the former, we offer a choice between time-optimised versus constant-time execution, with or without Z-coordinate randomisation and base point blinding; and for the latter, we offer enabling or disabling default-case statements in the FSM descriptions. To obtain compactness and at the same time fast execution times, we make maximum use of the DSP slices on the FPGA. We designed a single arithmetic unit that is flexible to support operations with two moduli and non-modulus arithmetic. In addition, our design benefits in-place memory management and local storage of inputs into DSP slices' pipeline registers and takes advantage of distributed memory. These eliminate a memory access bottleneck. The flexibility is offered by a micro-code supported instruction-set architecture. Our design targets 7-Series Xilinx FPGAs and is prototyped on a Zynq SoC. The base design combining Ed25519 and X25519 in a single module, and its implementation requires only around 11.1 K LUTs, 2.6 K registers and 16 DSP slices. Also it achieves performance of 1.6 ms for a signature generation, and 3.6 ms for a signature verification for a 1024-bit message with a 82MHz clock. Moreover, the design can be optimised only for X25519 which gives the most compact FPGA implementation compared to previously published X25519 implementations. CCS Concepts: • Security and privacy → Security in hardware; Hardware security implementation; Embedded systems security;
We present a domain-specific co-processor to speed up Saber, a post-quantum key encapsulation mechanism competing on the NIST Post-Quantum Cryptography standardization process. Contrary to most lattice-based schemes, Saber doesn't use NTT-based polynomial multiplication. We follow a hardwaresoftware co-design approach: the execution is performed on an ARM core and only the most computationally expensive operation, i.e., polynomial multiplication, is offloaded to the coprocessor to obtain a compact design. We exploit the idea of distributed computing at micro-architectural level together with novel algorithmic optimizations to achieve approximately a 6 times speedup with respect to optimized software at a small area cost.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.