<p>Reliable and fast channel estimation is crucial for next-generation wireless networks supporting a wide range of vehicular and low-latency services. Recently, deep learning (DL) based channel estimation is being explored as an efficient alternative to conventional least-square (LS) and linear minimum mean square error (LMMSE) based channel estimation. Unlike LMMSE, DL methods do not need prior knowledge of channel statistics. Most of these approaches have not been realized on system-on-chip (SoC), and preliminary study shows that their complexity exceeds the complexity of the entire physical layer (PHY). The high latency of DL is another concern. This paper considers the design and implementation of deep neural network (DNN) augmented LS-based channel estimation (LSDNN) on Zynq multi-processor SoC (ZMPSoC). We demonstrate the gain in performance compared to the conventional LS and LMMSE channel estimation schemes. Via software-hardware co-design, word-length optimization, and reconfigurable architectures, we demonstrate the superiority of the LSDNN architecture over the LS and LMMSE for a wide range of SNR, number of pilots, preamble types, and wireless channels. Further, we evaluate the performance, power, and area (PPA) of the LS and LSDNN application-specific integrated circuit (ASIC) implementations in 45 nm technology. We demonstrate that the word-length optimization can substantially improve PPA for the proposed architecture in ASIC implementations.</p>