This paper proposes a Design Space Exploration for Edge machine learning through the utilization of the novel MathWorks FPGA Deep Learning Processor IP, featured in the HDL Deep Learning toolbox. With the ever-increasing demand for real-time machine learning applications, there is a critical need for efficient and low-latency hardware solutions that can operate at the edge of the network, in close proximity to the data source. The HDL Deep Learning toolbox provides a flexible and customizable platform for deploying deep learning models on FPGAs, enabling effective inference acceleration for embedded IoT applications. In this study, our primary focus lies in investigating the impact of parallel processing elements on the performance and resource utilization of the FPGA-based processor. By analyzing the trade-offs between accuracy, speed, energy efficiency, and hardware resource utilization, we aim to gain valuable insights into making optimal design choices for FPGA-based implementations. Our evaluation is conducted on the AMD-Xilinx ZC706 development board, which serves as the target device for our experiments. We consider all the compatible Convolutional Neural Networks available within the HDL Deep Learning toolbox to comprehensively assess the performances.