We introduce the usage of hardware performance counters (HPCs) as a new method that allows very precise access to known side channels and also allows access to many new side channels. Many current architectures provide hardware performance counters, which allow the profiling of software during runtime. Though they allow detailed profiling they are noisy by their very nature; HPC hardware is not validated along with the rest of the microprocessor. They are meant to serve as a relative measure and are most commonly used for profiling software projects or operating systems. Furthermore they are only accessible in restricted mode and can only be accessed by the operating system. We discuss this security model and we show first implementation results, which confirm that HPCs can be used to profile relatively short sequences of instructions with high precision. We focus on cache profiling and confirm our results by rerunning a recently published time based cache attack in which we replaced the time profiling function by HPCs.
In this article we present the fastest known implementation of a modular multiplication for a 160-bit standard compliant elliptic curve (secp160r1) for 8-bit micro controller which are typically used in WSNs. The major part (77%) of the processing time for an elliptic curve operation such as ECDSA or EC Diffie-Hellman is spent on modular multiplication. We present an optimized arithmetic algorithm which significantly speed up ECC schemes. The reduced processing time also yields a significantly lower energy consumption of ECC schemes. With our implementation results we can show that a 160-bit modular multiplication can be performed in 0.39 ms on an 8-bit AVR processor clocked at 7.37 MHz. This brings the vision of asymmetric cryptography in the field of WSNs with all its benefits for key-distribution and authentication a step closer to reality.
In this work, elliptic curve cryptography (ECC) is used to make an efficient implementation of a public-key cryptography algorithm on the ARM Cortex-M0+. The goal of this implementation is to make not only a fast, but also a very low-power software implementation. To aid in the elliptic curve parameter selection, the energy consumption of different instructions on the ARM Cortex-M0+ was measured and it was found that there is a variation of up to 22.5% between different instructions. The instruction set architecture (ISA) and energy measurements were used to make a simulation of both a binary curve and a prime curve implementation, and the former was found to have a slightly faster execution time with a lower power consumption. Binary curve arithmetic use instructions which requires less energy than prime curve arithmetic on the target platform. A new field multiplication algorithm is proposed, called López-Dahab with fixed registers, which is an optimization of the López-Dahab (LD) algorithm. The proposed algorithm has a performance improvement of 15% over the LD with rotating registers algorithm (which is the current fastest optimization of the LD algorithm). A software implementation that uses the proposed algorithm was made in C and assembly, and on average our implementation of a random point multiplication requires 34.16 µJ, whereas our fixed point multiplication requires 20.63 µJ. The energy consumption of our implementation beats all known software implementations on embedded platforms, of a point multiplication, on the same equivalent security level by a factor of 7.4.
Abstract-This paper describes the systematic design methods of an embedded co-processor for a post quantum secure McEliece cryptosystem. A hardware/software co-design has been targeted for the realization of McEliece in practice on low-cost embedded platforms. Design optimizations take place when choosing system parameters, algorithm transformations, architecture choices, and arithmetic primitives. The final architecture consists of an 8-bit PicoBlaze softcore for flexibility and several parallel acceleration units for throughput optimization. A prototype of the coprocessor is implemented on a Spartan-3an xc3s1400an FPGA, using less than 30% of its resources. On this FPGA, one McEliece decryption of an 80-bit security level takes less than 100K clock cycles corresponding to only 1 ms at a clock frequency of 92 MHz. This is 10 times faster and 3.8 times smaller than the existing design.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.