The Secure Hash Algorithm-256 (SHA-256) is a cryptographic function used in a wide variety of applications ranging from Internet of Things micro-devices to highperformance systems. This paper studies a set of implementations of the SHA-256 on a field-programmable gate array (FPGA) elaborated using the Open Computing Language (OpenCL). These implementations apply several optimization techniques to improve their respective throughputs. Reported results show that a combination of OpenCL optimization techniques allows obtaining an implementation offering a 90x speed-up when compared to an unoptimized OpenCL implementation. Moreover, the best reported optimized implementation achieves a throughput of 3973 Mbps, which is 4.3 times higher than the best previously published HLS-based SHA-256 implementation and even higher than the previously published implementations using a hardware description language. To our knowledge, this work is the first that proposes an OpenCL-based FPGA implementation of SHA-256 and its OpenCL-based optimization methodology.
The productivity achieved when developing applications on high-performance reconfigurable heterogeneous computing (HPRHC) systems is increased by using the Open Computing Language (OpenCL). However, the hardware produced by OpenCL compilers in field-programmable gate arrays (FPGAs) can result in severe performance bottlenecks that are challenging to solve. The problem is compounded by the fact that the generated netlist details are disorganized, making them mostly unreadable and only partially visible to designers. This paper proposes an in-FPGA instrumentation method and a new framework for extracting the FPGA-cycle-accurate timing performances of OpenCL-based designs. The results clearly show that the chosen execution model for OpenCL-based designs strongly affects the timing performance when it is not properly implemented. Our framework is implemented on an HPRHC platform that contains a CPU and two Arria10 FPGAs, and it is evaluated with a wide variety of benchmarks with different complexities. After testing on the reported benchmarks, the average logic overhead for one inserted instrument is 0.2 % of the total amount of adaptive look-up tables (ALUTs) and 0.1 % of the total registers in an FPGA. This resource utilization is between 1.5 and six times lower than those reported in the best previously published works. The scalability of the framework is also evaluated by inserting up to 50 instruments. The experimental results show that the average logic utilization per instrument is 0.19 % of the ALUTs and 0.17 % of the registers in the FPGA when 50 instruments are inserted.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.