As the development of Internet of Things (IoT), the data exchanged through the network has significantly increased. To secure the sensitive data with user’s personal information, it is necessary to encrypt the transmitted data. Since resource-constrained wireless devices are typically used for IoT services, it is required to optimize the performance of cryptographic algorithms which are computation-intensive tasks. In this paper, we present efficient implementations of ARX-based Korean Block Ciphers (HIGHT and LEA) with CounTeR (CTR) mode of operation, and CTR_DRBG, one of the most widely used DRBGs (Deterministic Random Bit Generators), on 8-bit AVR Microcontrollers (MCUs). Since 8-bit AVR MCUs are widely used for various types of IoT devices, we select it as the target platform in this paper. We present an efficient implementation of HIGHT and LEA by making full use of the property of CTR mode, where the nonce value is fixed, and only the counter value changes during the encryption. On our implementation, the cost of additional function calls occurred by the generation of look-up table can be reduced. With respect to CTR_DRBG, we identified several parts that do not need to be computed. Thus, precomputing those parts in offline and using them online can result in performance improvements for CTR_DRBG. Furthermore, we applied several optimization techniques by making full use of target devices’ characteristics with AVR assembly codes on 8-bit AVR MCUs. Our proposed table generation way can reduce the cost for building a precomputation table by around 6.7% and 9.1% in the case of LEA and HIGHT, respectively. Proposed implementations of LEA and HIGHT with CTR mode on 8-bit AVR MCUs provide 6.3% and 3.8% of improved performance, compared with the previous best results, respectively. Our implementations are the fastest compared to previous LEA and HIGHT implementations on 8-bit AVR MCUs. In addition, the proposed CTR_DRBG implementations on AVR provide better performance by 37.2% and 8.7% when the underlying block cipher is LEA and HIGHT, respectively.
With the advent of IoT and Cloud computing service technology, the size of user data to be managed and file data to be transmitted has been significantly increased. To protect users’ personal information, it is necessary to encrypt it in secure and efficient way. Since servers handling a number of clients or IoT devices have to encrypt a large amount of data without compromising service capabilities in real-time, Graphic Processing Units (GPUs) have been considered as a proper candidate for a crypto accelerator for processing a huge amount of data in this situation. In this paper, we present highly efficient implementations of block ciphers on NVIDIA GPUs (especially, Maxwell, Pascal, and Turing architectures) for environments using massively large data in IoT and Cloud computing applications. As block cipher algorithms, we choose AES, a representative standard block cipher algorithm; LEA, which was recently added in ISO/IEC 29192-2:2019 standard; and CHAM, a recently developed lightweight block cipher algorithm. To maximize the parallelism in the encryption process, we utilize Counter (CTR) mode of operation and customize it by using GPU’s characteristics. We applied several optimization techniques with respect to the characteristics of GPU architecture such as kernel parallelism, memory optimization, and CUDA stream. Furthermore, we optimized each target cipher by considering the algorithmic characteristics of each cipher by implementing the core part of each cipher with handcrafted inline PTX (Parallel Thread eXecution) codes, which are virtual assembly codes in CUDA platforms. With the application of our optimization techniques, in our implementation on RTX 2070 GPU, AES and LEA show up to 310 Gbps and 2.47 Tbps of throughput, respectively, which are 10.7% and 67% improved compared with the 279.86 Gbps and 1.47 Tbps of the previous best result. In the case of CHAM, this is the first optimized implementation on GPUs and it achieves 3.03 Tbps of throughput on RTX 2070 GPU.
XTS-AES is a disk encryption mode of operation that uses the block cipher AES. Several studies have been conducted to improve the encryption speed using XTS-AES according to the increasing disk size. Among them, there are researches on parallel encryption of XTS-AES using GPU. Although these studies focus on parallel encryption of AES, optimization for the entire XTS mode has not been performed. The reason is that the α j computation process included in XTS mode is not suitable for parallel operation. Therefore, in this paper, we proposed several techniques for high-speed encryption in GPU by modifying XTS-AES into a form that is advantageous for parallel operation. The core idea is to pre-calculate the α j calculation on the CPU into a form that is easy to operate on the GPU. To achieve this goal, we analyzed the α j calculation process and present the parts that can be optimized. First, we presented a method that can replace multiple operations with a single table reference through the analyzed α j computation progress. Thereafter, we proposed a method that can be calculated by partially skipping the entire α j computation process that must be sequentially calculated through the table reference technique. For the proposed optimization implementation, we presented various results for evaluating the optimal implementation. In addition, we compared the performance of XTS-AES OpenSSL implementation on CPU and our proposed optimization implementation on GPU.
With the development of the Internet of Things (IoT) and cloud computing technology, various cryptographic systems have been proposed to protect increasing personal information. Recently, Post-Quantum Cryptography (PQC) algorithms have been proposed to counter quantum algorithms that threaten public key cryptography. To efficiently use PQC in a server environment dealing with large amounts of data, optimization studies are required. In this paper, we present optimization methods for FrodoKEM and NewHope, which are the NIST PQC standardization round 2 competition algorithms in the Graphics Processing Unit (GPU) platform. For each algorithm, we present a part that can perform parallel processing of major operations with a large computational load using the characteristics of the GPU. In the case of FrodoKEM, we introduce parallel optimization techniques for matrix generation operations and matrix arithmetic operations such as addition and multiplication. In the case of NewHope, we present a parallel processing technique for polynomial-based operations. In the encryption process of FrodoKEM, the performance improvements have been confirmed up to 5.2, 5.75, and 6.47 times faster than the CPU implementation in FrodoKEM-640, FrodoKEM-976, and FrodoKEM-1344, respectively. In the encryption process of NewHope, the performance improvements have been shown up to 3.33 and 4.04 times faster than the CPU implementation in NewHope-512 and NewHope-1024, respectively. The results of this study can be used in the IoT devices server or cloud computing service server. In addition, the results of this study can be utilized in image processing technologies such as facial recognition technology.
As the technology of Internet of Things (IoT) evolves, abundant data is generated from sensor nodes and exchanged between them. For this reason, efficient encryption is required to keep data in secret. Since low-end IoT devices have limited computation power, it is difficult to operate expensive ciphers on them. Lightweight block ciphers reduce computation overheads, which are suitable for low-end IoT platforms. In this paper, we implemented the optimized CHAM block cipher in the counter mode of operation, on 8-bit AVR microcontrollers (i.e., representative sensor nodes). There are four new techniques applied. First, the execution time is drastically reduced, by skipping eight rounds through pre-calculation and look-up table access. Second, the encryption with a variable-key scenario is optimized with the on-the-fly table calculation. Third, the encryption in a parallel way makes multiple blocks computed in online for CHAM-64/128 case. Fourth, the state-of-art engineering technique is fully utilized in terms of the instruction level and register level. With these optimization methods, proposed optimized CHAM implementations for counter mode of operation outperformed the state-of-art implementations by 12.8%, 8.9%, and 9.6% for CHAM-64/128, CHAM-128/128, and CHAM-128/256, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.