“…The finite-time error bounds for the gradient TD algorithms [Maei et al, 2010, Maei et al, 2010 were further developed recently in [Dalal et al, 2018b, Liu et al, 2015, Gupta et al, 2019, Xu et al, 2019, Dalal et al, 2020, Kaledin et al, 2020, Wang and Zou, 2020, Ma et al, 2021. There are also finite-time error bounds on the policy gradient methods and actor critic methods, e.g., , Kumar et al, 2019, Qiu et al, 2019, Wu et al, 2020, Cen et al, 2020, Bhandari and Russo, 2019, Agarwal et al, 2019, Mei et al, 2020. We note that these studies are for the non-robust RL algorithms, and in this paper, we design robust RL algorithms, and characterize their finite-time error bounds.…”