The permutation flow shop scheduling problem (PFSP) stands as a classic conundrum within the realm of combinatorial optimization, serving as a prevalent organizational structure in authentic production settings. Given that conventional scheduling approaches fall short of effectively addressing the intricate and ever-shifting production landscape of PFSP, this study proposes an end-to-end deep reinforcement learning methodology with the objective of minimizing the maximum completion time. To tackle PFSP, we initially model it as a Markov decision process, delineating pertinent states, actions, and reward functions. A notably innovative facet of our approach involves leveraging disjunctive graphs to represent PFSP state information. To glean the intrinsic topological data embedded within the disjunctive graph’s underpinning, we architect a policy network based on a graph isomorphism network, subsequently trained through proximal policy optimization. Our devised methodology is compared with six baseline methods on randomly generated instances and the Taillard benchmark, respectively. The experimental results unequivocally underscore the superiority of our proposed approach in terms of makespan and computation time. Notably, the makespan can save up to 183.2 h in randomly generated instances and 188.4 h in the Taillard benchmark. The calculation time can be reduced by up to 18.70 s for randomly generated instances and up to 18.16 s for the Taillard benchmark.