“…T log T q hard violation when the objective is strongly-convex [21]. Safe Reinforcement Learning: Safe reinforcement learning (RL) refers to reinforcement learning with safety constraints and has received great interest as well [5,17,19,26,46,11,43,16,15,14,29,4,44,9,20,47]. In safe RL, The agent optimizes the policy by interacting with the environment without violating safety constraints.…”