“…Safe RL, especially those with expected cumulative constraints, has been extensively studied under model-free approaches (Wei, Liu, and Ying 2022b,a;Wei et al 2023;Ghosh, Zhou, and Shroff 2022), and model-based approaches (Ding et al 2021;Liu et al 2021a;Bura et al 2021;Singh, Gupta, and Shroff 2020;Ding et al 2021;Chen, Jain, and Luo 2022). There are also many works (Liu, Jiang, and Li 2022;Wu et al 2018;Caramanis, Dimitrov, and Morton 2014) that have studied the knapsack constraints, wherein the learning process stops whenever the budget has run out.…”