The traditional dynamic weapon target assignment model is a combination of multiple static weapon target assignment stages. The assignment of the next stage is carried out after the result of the previous static weapon target assignment is settled. However, between two static weapon target assignment stages, the threat ranking of multiple targets may change with time, and the traditional dynamic weapon target assignment model does not take this time issue into consideration. This paper proposes a "time sampling dynamic weapon assignment model". This model divides the decision-making stage by setting the time interval of data collection, and it can capture the real-time changes in the target threat degree and make timely decisions. With this model, this study designed a dynamic weapon target assignment method based on the reinforcement learning algorithm. Additionally, according to this method, a comparative experiment with different sampling time divisions was designed, and a better sampling time division method was obtained. Finally, a comparative experiment between the reinforcement learning algorithm and the traditional heuristic algorithm was designed in this study. The simulation results show that, compared with the traditional heuristic algorithm, the proposed assignment model and the reinforcement learning algorithm are better in terms of decision-making timeliness and global considerations.INDEX TERMS dynamic weapon target assignment; simulation model; reinforcement learning; heuristic algorithm; PPO algorithm.