“…Variance reduction in RL. The seminal idea of variance reduction was originally proposed to accelerate finite-sum stochastic optimization, e.g., Gower et al (2020); Johnson and Zhang (2013); Nguyen et al (2017). Thereafter, the variance reduction strategy has been imported to RL, which assists in improving the sample efficiency of RL algorithms in multiple contexts, including but not limited to policy evaluation (Du et al, 2017;Khamaru et al, 2020;Wai et al, 2019;Xu et al, 2019), RL with a generative model (Sidford et al, 2018a,b;Wainwright, 2019b), asynchronous Q-learning (Li et al, 2020b), and offline RL (Yin et al, 2021).…”