In the evolving landscape of computer vision, the integration of machine learning algorithms with cutting-edge hardware platforms is increasingly pivotal, especially in the context of disruptive healthcare systems. This study introduces an optimized implementation of a Convolutional Neural Network (CNN) on the Basys3 FPGA, designed specifically for accelerating the classification of cytotoxicity in human kidney cells. Addressing the challenges posed by constrained dataset sizes, compute-intensive AI algorithms, and hardware limitations, the approach presented in this paper leverages efficient image augmentation and pre-processing techniques to enhance both prediction accuracy and the training efficiency. The CNN, quantized to 8-bit precision and tailored for the FPGA’s resource constraints, significantly accelerates training by a factor of three while consuming only 1.33% of the power compared to a traditional software-based CNN running on an NVIDIA K80 GPU. The network architecture, composed of seven layers with excessive hyperparameters, processes downscale grayscale images, achieving notable gains in speed and energy efficiency. A cornerstone of our methodology is the emphasis on parallel processing, data type optimization, and reduced logic space usage through 8-bit integer operations. We conducted extensive image pre-processing, including histogram equalization and artefact removal, to maximize feature extraction from the augmented dataset. Achieving an accuracy of approximately 91% on unseen images, this FPGA-implemented CNN demonstrates the potential for rapid, low-power medical diagnostics within a broader IoT ecosystem where data could be assessed online. This work underscores the feasibility of deploying resource-efficient AI models in environments where traditional high-performance computing resources are unavailable, typically in healthcare settings, paving the way for and contributing to advanced computer vision techniques in embedded systems.