“…Lastly, Cai et al [23] presents a method for finding RL control policies that satisfy LTL specifications through safe reinforcement learning. The developed reward shaping process improves reward density and induces maximum probability of LTL satisfaction, while the safe padding technique maintains safe exploration without affecting the original probabilistic guarantees.…”