The demand for Artificial Intelligence (AI) based solutions is exponentially increasing in all application fields, including low-power devices on the edge. However, due to their limited computational capabilities, these devices, which run Central Processing Units (CPUs) tailored to embedded applications, are typically not optimized to run complex neural networks. Providing ad-hoc extensions to the instruction set architecture of a RISC-V processor can be a viable solution to address this issue. In this work, we propose the use of the PyTorch Graph Lowering (Glow)-LLVM toolchain to understand the impact of compiled code of AI models on a RISC-V machine and extend its instruction set to improve runtime performance. This approach allows code profiling, computational bottlenecks detection, provisioning of the necessary CPU enhancements that can be implemented in the LLVM backend before hardware implementation. After profiling known Artificial Neural Networks (ANNs) quantized to int8 (particularly, a single perceptron, RESNET18, VGG11, and LENET5), we have identified and devised three additional instructions named LWM, LWA and LWS, Load Word-and-Multiply, -Add, and -Subtract. As a result, we obtained an edge AIoriented, significantly improved processor description in terms of inference time and program density, ready to be hardware designed. For 128×128 RGB images the custom extensions enable up to 13× speed up compared to RV32I and 5× compared to RV32IM, with a maximum of 11.7% lower code. This paper also systematically highlights the main methodological steps to include new instructions in an LLVM backend.INDEX TERMS RISC-V, AI, RISC-V custom instructions extensions, LLVM, instruction set architecture, hardware-software co-design.