Linear algebra computational kernels based on byte and sub-byte integer data formats are at the base of many classes of applications, ranging from Deep Learning to Pattern Matching. Porting the computation of these applications from cloud to edge and mobile devices would enable significant improvements in terms of security, safety, and energy efficiency. However, despite their low memory and energy demands, their intrinsically high computational intensity makes the execution of these workloads challenging on highly resource-constrained devices. In this paper, we present BiSon-e, a novel RISC-V based architecture that accelerates linear algebra kernels based on narrow integer computations on edge processors by performing Single Instruction Multiple Data (SIMD) operations on off-the-shelf scalar Functional Units (FUs). Our novel architecture is built upon the binary segmentation technique, which allows to significantly reduce the memory footprint and the arithmetic intensity of linear algebra kernels requiring narrow data sizes. We integrate BiSon-e into a complete System-on-Chip (SoC) based on RISC-V, synthesized and Place&Routed in 65nm and 22nm technologies, introducing a negligible 0.07% area overhead with respect to the baseline architecture. Our experimental evaluation shows that, when computing the Convolution and Fully-Connected layers of the AlexNet and VGG-16 Convolutional Neural Networks (CNNs) with 8-, 4-, and 2-bit, our solution gains up to 5.6×, 13.9× and 24× in execution time compared to the scalar implementation of a single RISC-V core, and improves the energy efficiency of string matching tasks by 5× when compared to a RISC-V-based Vector Processing Unit (VPU).