Big data analysis can provide valuable insights not easily obtained from traditional data scales. However, addressing scheduling issues in big data can be challenging due to the vast amount and diverse nature of the data. To overcome this, a scheduling model based on Markov decision process is proposed. The deep Q-network algorithm is used for directed acyclic graph task scheduling. To improve this model further, the gradient strategy algorithm is introduced. From the results, when the dataset size was about 500, the hybrid algorithm achieved a recall rate of 0.96, outperforming the gradient strategy algorithm (0.83), deep Q-network algorithm (0.79), and estimated earliest completion time algorithm (0.63). Although the estimated earliest completion time algorithm had longer training times under different dataset sizes, the hybrid algorithm's training time was slightly longer than the gradient strategy algorithm and slightly shorter than the deep Q-network algorithm. Overall, the proposed algorithm exhibits superior performance and significant value in solving engineering problems.