Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
SUMMARYThis paper presents a software-based parallel cryptographic solution with a massive-parallel memory-embedded SIMD matrix (MTX) for data-storage systems. MTX can have up to 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. Furthermore, a next-generation SIMD matrix called MX-2 has been developed by expanding processing-element capability of MTX from 2-bit to 4-bit processing. These SIMD matrix architectures are verified to be a better alternative for processing repeated-arithmetic and logical-operations in multimedia applications with low power consumption. Moreover, we have proposed combining Content Addressable Memory (CAM) technology with the massive-parallel memory-embedded SIMD matrix architecture to enable fast pipelined table-lookup coding. Since both arithmetic logical operation and table-lookup coding execute extremely fast on these architectures, efficient execution of encryption and decryption algorithms can be realized. Evaluation results of the CAMless and CAM-enhanced massive-parallel SIMD matrix processor for the example of the Advanced Encryption Standard (AES), which is a widelyused cryptographic algorithm, show that a throughput of up to 2.19 Gbps becomes possible. This means that several standard data-storage transfer specifications, such as SD, CF (Compact Flash), USB (Universal Serial Bus) and SATA (Serial Advanced Technology Attachment) can be covered. Consequently, the massive-parallel SIMD matrix architecture is very suitable for private information protection in several data-storage media. A further advantage of the software based solution is the flexible update possibility of the implemented-cryptographic algorithm to a safer future algorithm. The massive-parallel memory-embedded SIMD matrix architecture (MTX and MX-2) is therefore a promising solution for integrated realization of real-time cryptographic algorithms with low power dissipation and small Si-area consumption. key words: matrix-processing architecture, SIMD, bit-serial and wordparallel, CAM, table-lookup coding, cryptographic algorithm, AES
SUMMARYThis paper presents a software-based parallel cryptographic solution with a massive-parallel memory-embedded SIMD matrix (MTX) for data-storage systems. MTX can have up to 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. Furthermore, a next-generation SIMD matrix called MX-2 has been developed by expanding processing-element capability of MTX from 2-bit to 4-bit processing. These SIMD matrix architectures are verified to be a better alternative for processing repeated-arithmetic and logical-operations in multimedia applications with low power consumption. Moreover, we have proposed combining Content Addressable Memory (CAM) technology with the massive-parallel memory-embedded SIMD matrix architecture to enable fast pipelined table-lookup coding. Since both arithmetic logical operation and table-lookup coding execute extremely fast on these architectures, efficient execution of encryption and decryption algorithms can be realized. Evaluation results of the CAMless and CAM-enhanced massive-parallel SIMD matrix processor for the example of the Advanced Encryption Standard (AES), which is a widelyused cryptographic algorithm, show that a throughput of up to 2.19 Gbps becomes possible. This means that several standard data-storage transfer specifications, such as SD, CF (Compact Flash), USB (Universal Serial Bus) and SATA (Serial Advanced Technology Attachment) can be covered. Consequently, the massive-parallel SIMD matrix architecture is very suitable for private information protection in several data-storage media. A further advantage of the software based solution is the flexible update possibility of the implemented-cryptographic algorithm to a safer future algorithm. The massive-parallel memory-embedded SIMD matrix architecture (MTX and MX-2) is therefore a promising solution for integrated realization of real-time cryptographic algorithms with low power dissipation and small Si-area consumption. key words: matrix-processing architecture, SIMD, bit-serial and wordparallel, CAM, table-lookup coding, cryptographic algorithm, AES
This paper presents secure data processing with a massive-parallel single-instruction multiple-data (SIMD) matrix for embedded system-on-chip (SoC) in digital-convergence mobile devices. Recent mobile devices are required to use private-informationsecure technology, such as cipher processing, to prevent the leakage of personal information. However, this adds to the device's required specifications, especially cipher implementation for fast processing, power consumption, low hardware cost, adaptability, and end-user's operation for maintaining the safety condition. To satisfy these security-related requirements, we propose the interleaved-bitslice processing method, which combines two processing concepts (bitslice processing and interleaved processing), for novel parallel block cipher processing with five confidentiality modes on mobile processors. Furthermore, we adopt a massiveparallel SIMD matrix processor (MX-1) for interleaved-bitslice processing to verify the effectiveness of parallel block cipher implementation. As the implementation target from the Federal Information Processing Standardization-approved block ciphers, a data encryption standard (DES), triple-DES, and Advanced Encryption Standard (AES) algorithms are selected. For the AES algorithm, which is mainly studied in this paper, the MX-1 implementation has up to 93% fewer clock cycles per byte than other conventional mobile processors. Additionally, the MX-1 results are almost constant for all confidentiality modes. The practical-use energy efficiency of parallel block cipher processing with the evaluation board for MX-1 was found to be about 4.8 times higher than that of a BeagleBoard-xM, which is a single-board computer and uses the ARM Cortex-A8 mobile processor. Furthermore, to improve the operation of a single-bit logical function, we propose the development of a multi-bit logical library for interleaved-bitslice cipher processing with MX-1. Thus, the number of clock cycles is the smallest among those reported in other related-studies. Consequently, interleaved-bitslice block cipher processing with five confidentiality modes on MX-1 is effective for the implementation of parallel block cipher processing for several digital-convergence mobile devices.
Several multimedia applications have recently been implemented on mobile devices, including digital image compression, video compression, and audio processing. Furthermore, Artificial Intelligence (AI) processing has grown in popularity, necessitating the execution of large amounts of data in mobile devices. Therefore, the processing core in a mobile device requires high performance, programmability, and versatility. Multimedia apps for mobile devices typically comprise repeated arithmetic and table‐lookup coding operations. A Content Addressable Memory‐based massive‐parallel SIMD matriX core (CAMX) is presented to increase the processing speed of both operations on a processing core. The CAMX serves as a CPU core accelerator for mobile devices. The CAMX supports high‐parallel processing and is equipped with two CAM modules for high‐speed repeated arithmetic and table‐lookup coding operations. The CAMX has great performance, programmability, and versatility on mobile devices because it can handle logical, arithmetic, search, and shift operations in parallel. This paper shows that the CAMX can process parallel repeated arithmetic and table‐lookup coding operations; single‐precision floating‐point arithmetic can calculate 1024 entries in 5613 clock cycles in parallel without embedding a dedicated floating‐point arithmetic unit. This clock cycle using two's complement‐reduced floating‐point addition implementation decreases 59% than the implementation of straight‐forward floating‐point addition. The implementation of straight‐forward floating‐point additions is improved as two's complement instruction reduced algorithms. Thus, this paper proposes an instruction reduction architecture by modulating the CAMX to directly access the data in the left and right CAM modules from the preserve register. The CAMX has achieved high performance, programmability, and versatility by not embedding a dedicated processing unit. Moreover, assuming the CAMX processes at an operating frequency of 0.1, 0.5, 1.0, or 1.5 GHz, it can process floating‐point additions above approximately 4500 parallelized data, with better performance than an ARM core using NEON and Vector Floating‐Point (VFP). In addition, related works executed by software instruction, dedicated floating‐point arithmetic unit, or both and the CAMX are compared while assuming the same operation frequency. From this result, the CAMX which has 128‐bit and 1024‐entry CAM modules achieves higher performance than the related works executed by only software instructions and by combining software instructions and a dedicated floating‐point arithmetic unit. © 2023 Institute of Electrical Engineers of Japan. Published by Wiley Periodicals LLC.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.