In recent times, the trend in very large scale integration (VLSI) industry is multi-dimensional, e.g. reduction of energy consumption, occupancy of less space, precise result, less power dissipation, faster response etc. To meet these needs, the hardware architecture should be reliable and robust to these problems. Recently, neural network and deep learning has been started to impact the present research paradigm significantly which consists of parameters in the order of millions, nonlinear function for activation, convolutional operation for feature extraction, softmax regression for classification, generative adversarial networks, etc. These operations involve huge calculation and memory overhead. Presently available DSP processors are incapable of performing these operations and they most face the problems e.g. memory overhead, performance drop and compromised accuracy. Moreover, if a huge silicon area is powered to accelerate the operation using parallel computation, the IC's will be having significant chance of burning out due to the considerable generation of heat. Hence, novel dark silicon constraint is developed to reduce the heat dissipation without sacrificing the accuracy. Similarly, different algorithms have been adapted to design a DSP processor compatible for fast performance in neural network, activation function, convolutional neural network and generative adversarial network. In this review, we illustrate the recent developments in hardware for accelerating the efficient implementation of deep learning networks with enhanced performance. The techniques investigated in this review are expected to direct future research challenges of hardware optimization for high-performance computations.