This study considers a parallel dedicated machine scheduling problem towards minimizing the total tardiness of allocated jobs on machines. In addition, this problem comes under the category of NP-hard. Unlike classical parallel machine scheduling, a job is processed by only one of the dedicated machines according to its job type defined in advance, and a machine is able to process at most one job at a time. To obtain a high-quality schedule in terms of total tardiness for the considered scheduling problem, we suggest a machine scheduler based on double deep Q-learning. In the training phase, the considered scheduling problem is redesigned to fit into the reinforcement learning framework and suggest the concepts of state, action, and reward to understand the occurrences of setup, tardiness, and the statuses of allocated job types. The proposed scheduler, repeatedly finds better Q-values towards minimizing tardiness of allocated jobs by updating the weights in a neural network. Then, the scheduling performances of the proposed scheduler are evaluated by comparing it with the conventional ones. The results show that the proposed scheduler outperforms the conventional ones. In particular, for two datasets presenting extra-large scheduling problems, our model performs better compared to existing genetic algorithm by 12.32% and 29.69%.