This paper focuses on the implementation of a neural network accelerator optimized for speed and energy efficiency, for use in embedded machine learning. Specifically, we explore power reduction at the hardware level through systolic array and low-precision data systems, including quantized approaches. We present a comprehensive analysis comparing a full precision (FP16) accelerator with a quantized (INT16) version on an FPGA. We upgraded the FP16 modules to handle INT16 values, employing data shifts to enhance value density while maintaining accuracy. Through single convolution experiments, we assess the energy consumption and error minimization. The paper’s structure includes a detailed description of the FP16 accelerator, the transition to quantization, mathematical and implementation insights, instrumentation for power measurement, and a comparative analysis of power consumption and convolution error. Our results attempt to identify a pattern in 16-bit quantization to achieve significant power savings with minimal loss of accuracy.