Autonomous driving systems are crucial complicated cyber–physical systems that combine physical environment awareness with cognitive computing. Deep reinforcement learning is currently commonly used in the decision-making of such systems. However, black-box-based deep reinforcement learning systems do not guarantee system safety and the interpretability of the reward-function settings in the face of complex environments and the influence of uncontrolled uncertainties. Therefore, a formal security reinforcement learning method is proposed. First, we propose an environmental modeling approach based on the influence of nondeterministic environmental factors, which enables the precise quantification of environmental issues. Second, we use the environment model to formalize the reward machine’s structure, which is used to guide the reward-function setting in reinforcement learning. Third, we generate a control barrier function to ensure a safer state behavior policy for reinforcement learning. Finally, we verify the method’s effectiveness in intelligent driving using overtaking and lane-changing scenarios.