We present a versatile open-source framework designed to facilitate efficient, numerically-tailored Matrix-Matrix Multiplications (MMMs). The framework offers two primary contributions: first, a fine-tuned, automated pipeline for arithmetic datapath generation, enabling highly customizable systolic MMM kernels; second, seamless integration of the generated kernels into user code, irrespective of the programming language employed, without necessitating modifications.We employ this framework within a cutting-edge platform, comprising a Power9 host, an OpenCAPI link, and a Xilinx Virtex UltraScale+ FPGA. The framework demonstrates a systematic enhancement in accuracy per energy cost across diverse High Performance Computing (HPC) workloads displaying a variety of numerical requirements, such as Artificial Intelligence (AI) inference and Sea Surface Height (SSH) computation. For AI inference, we consider a set of state-of-the-art neural network models, namely ResNet18, ResNet34, ResNet50, DenseNet121, DenseNet161, DenseNet169, and VGG11, in conjunction with two datasets, two computer formats, and 27 distinct intermediate arithmetic datapaths. Our approach consistently reduces energy consumption across all cases, with a notable example being the reduction by factors of 3.3× for IEEE754-32 and 1.4× for Bfloat16 during ImageNet inference with ResNet50. This is accomplished while maintaining accuracies of 82.3% and 86%, comparable to those achieved with conventional Floating-Point Units (FPUs). In the context of SSH computation, our method achieves fully-reproducible results using double-precision words, surpassing the accuracy of conventional double-and quad-precision arithmetic in FPUs. Our approach enhances SSH computation accuracy by a minimum of 5× and 27× compared to IEEE754-64 and IEEE754-128, respectively, resulting in 5.6× and 15.1× improvements in accuracy per power cost.