Deep learning networks (DLNs) use multilayer neural networks for multiclass classification that exhibit better results in wind-power forecasting applications. However, improving the training process using proper parameter hyperisations and techniques, such as regularisation and Adam-based optimisation, remains a challenge in the design of DLNs for processing time-series data. Moreover, the most appropriate parameter for the DLN model is to solve the wind-power forecasting problem by considering the excess training algorithms, such as the optimiser, activation function, batch size, and dropout. Reinforcement learning (RN) schemes constitute a smart approach to explore the proper initial parameters for the developed DLN model, considering a balance between exploration and exploitation processes. Therefore, the present study focuses on determining the proper hyperparameters for DLN models using a Q-learning scheme for four developed models. To verify the effectiveness of the developed temporal convolution network (TCN) models, experiments with five different sets of initial parameters for the TCN model were determined by the output results of Q-learning computation. The experimental results showed that the TCN accuracy for 168 h wind power prediction reached a mean absolute percentage error of 1.41%. In evaluating the effectiveness of selection of hyperparameters for the proposed model, the performance of four DLN-based prediction models for power forecasting—TCN, long short-term memory (LSTM), recurrent neural network (RNN), and gated recurrence unit (GRU) models—were compared. The overall detection accuracy of the TCN model exhibited higher prediction accuracy compared to canonical recurrent networks (i.e., the GRU, LSTM, and RNN models).