The size of neural networks in deep learning techniques is increasing and varies significantly according to the requirements of real-life applications. The increasing network size, along with the scalability requirements, poses significant challenges for a high performance implementation of deep neural networks (DNN). Conventional implementations, such as graphical processing units and application specific integrated circuits, are either less efficient or less flexible. Consequently, this article presents a system-on-chip (SoC) solution for the acceleration of DNN, where an ARM processor controls the overall execution and off-loads computational intensive operations to a hardware accelerator. The system implementation is performed on a SoC development board. Experimental results show that the proposed system achieves a speed-up of 22.3, with a network architecture size of 64X64, in comparison with the native implementation on a dual core cortex ARM-A9 processor. In order to generalize the performance of complete system, a mathematical formula is presented which allows to compute the total execution time for any architecture size. The validation is performed by taking Epileptic Seizure Recognition as the target case study. Finally, the results of the proposed solution are compared with various state-of-the-art solutions in terms of execution time, scalability, and clock frequency.