A Multi-One Instruction Set Computer for Microcontroller Applications

Self Cite

The demand for Artificial Intelligence (AI) based solutions is exponentially increasing in all application fields, including low-power devices on the edge. However, due to their limited computational capabilities, these devices, which run Central Processing Units (CPUs) tailored to embedded applications, are typically not optimized to run complex neural networks. Providing ad-hoc extensions to the instruction set architecture of a RISC-V processor can be a viable solution to address this issue. In this work, we propose the use of the PyTorch Graph Lowering (Glow)-LLVM toolchain to understand the impact of compiled code of AI models on a RISC-V machine and extend its instruction set to improve runtime performance. This approach allows code profiling, computational bottlenecks detection, provisioning of the necessary CPU enhancements that can be implemented in the LLVM backend before hardware implementation. After profiling known Artificial Neural Networks (ANNs) quantized to int8 (particularly, a single perceptron, RESNET18, VGG11, and LENET5), we have identified and devised three additional instructions named LWM, LWA and LWS, Load Word-and-Multiply, -Add, and -Subtract. As a result, we obtained an edge AIoriented, significantly improved processor description in terms of inference time and program density, ready to be hardware designed. For 128×128 RGB images the custom extensions enable up to 13× speed up compared to RV32I and 5× compared to RV32IM, with a maximum of 11.7% lower code. This paper also systematically highlights the main methodological steps to include new instructions in an LLVM backend.INDEX TERMS RISC-V, AI, RISC-V custom instructions extensions, LLVM, instruction set architecture, hardware-software co-design.

Section: A Risc-v Binary Generation Using Glowmentioning

confidence: 99%

“…The RESNET18.onnx spillage highlights frequent memory and arithmetic operations. This result is quite common because AI frameworks involve the use of a large quantity of data and apply complex computations with multiple operands [20].…”

Section: B Instruction Sequencingmentioning

confidence: 99%

Designing RISC-V Instruction Set Extensions for Artificial Neural Networks: An LLVM Compiler-Driven Perspective

Balasubramanian,

Salvo,

Rocchia

et al. 2024

Self Cite

“…The obtained static and dynamic power consumption of the 10CL025 FPGA is 248 mW. We have considered this approach because a physical measurement of the consumed current in the board as done in [68] includes the consumption of all the on-board chips of the evaluation board.…”

Section: ) Power Consumptionmentioning

confidence: 99%

An 8-bit Single Perceptron Processing Unit for Tiny Machine Learning Applications

Crepaldi,

Salvo,

Merello

2023

Self Cite

We present a tiny MultiLayer Perceptron (MLP) accelerator named Single Perceptron Linear Vector Processor (SPLVP) that aims at extending the capabilities of limited resources MCUs, enabling inference time speedup and main CPU off-load. It is based on a single perceptron hardware unit, enhanced with an additional accumulation input and scaling features, that is sequentially scheduled to cover all the nodes of the network. The accelerator supports both linear and Rectified Linear Unit (ReLU) activation and its firmware can be generated from 8-bit tflite quantized models. We also present a complete design toolchain that encompasses supervised learning, compilation, assembly, simulation, and device programming. The hardware support for extra accumulation input and scaling, together with the processor memory partitioning, are the key features that enable significant speedups. By solving a toy recognition problem based on image data captured from an infra-red camera, measurements show that the execution speed of SPLVP at 80 MHz outperforms an ARM Cortex-M4 STM32L476 microcontroller by a factor of 9.2 when the same ANN is translated to MCU code using the STM CubeMX-Ai converter at the same clock frequency. SPLVP is synthesized on a low-cost and gate-count Cyclone 10 LP FPGA resulting in an 18% logic and 77% memory occupation. The SPLVP assembly code can be directly converted into a VHDL description that directly hardcodes the ANN. The execution speed of an ANN model for Iris classification, fully synthesized, improves by a factor of 209 compared to firmware execution on the MCU. To verify the operation of SPLVP and its design framework, we have designed various tiny Machine Learning (ML) classifiers, for which we briefly discuss the obtained performance and the preprocessing techniques used. Across all these classifiers, the obtained speedup compared to the STM32 is 8.3-14.9 ×. INDEX TERMSNeural processing unit, multilayer perceptron, single perceptron linear vector processor, fully connected neural network, FPGA, compiler, design toolchain, MCU, tiny machine learning.

“…The boards also feature serial communication interfaces, including Universal Serial Bus (USB) on some models, for loading programs from personal computers. The microcontrollers are programmed using a dialect of features from the programming languages C and C++ [57], [58], [59], [60]. The Arduino Mega 2560, based on the ATmega2560 microcontroller, is used in this study.…”

Section: ) Controller-arduinomentioning

confidence: 99%

Design for an Intelligent Waste Classifying System: A Case Study of Plastic Bottles

Rianmora,

Punsawat,

Yutisayanuwat

et al. 2023

The use of plastic bottles has become a significant environmental concern, and recycling them has become a priority. Small and medium-sized recycling companies must collect and categorize large volumes of plastic bottles and sell them to larger recycling firms, a process that is time-consuming, costly, and labor-intensive. This manual sorting process can pose health risks, particularly during the COVID-19 pandemic, and can affect worker productivity. To address these issues, this study proposes the development of an automated conveyor belt system that can rapidly and accurately separate plastic bottles by type. The system utilizes an opaque and transparent plastic bottle separation platform, which saves time, cost, and manpower. This system design provides recycling SMEs with a competitive advantage by serving as a practical application model and a prototype with an easy-to-use concept. Key tools employed in this research include product design development (PDD), Kansei engineering, manufacturing process design, controlling system, and fault tree analysis (FTA). The light sensors are critical components in the separation process, detecting the opacity or transparency of the bottles' surfaces. The proposed prototype's reliability will be assessed by FTA, which considers all potential failures. This study contributes to the body of knowledge surrounding the integration of conveyor systems and provides valuable information for businesses seeking to optimize their sorting processes. The guidelines developed in this study can serve as a starting point for further research on the integration of conveyors in waste sorting plants.INDEX TERMS Product design and development, Kansei engineering, conveyor belt, microcontroller, light sensor, fault tree analysis.