“…RL methods RL was used in nine studies (9/14, 64%) with algorithms including A3C, DDPG, DQN, Dueling DQN, HER, PI 2 , PPO, and Rainbow (Chi et al, 2018a(Chi et al, , 2020Behr et al, 2019;You et al, 2019;Kweon et al, 2021;Meng et al, 2021Meng et al, , 2022Cho et al, 2022;Karstensen et al, 2022). Demonstrator data in some form (GAIL, Behavior Cloning, or HD) was used as a precursor in four of the studies (4/14, 29%) during training (LfD), in conjunction with other RL algorithms (Chi et al, 2018a;Behr et al, 2019;Kweon et al, 2021;Cho et al, 2022). The SOFA framework (Inria, Strasbourg, France; Faure et al, 2012) was used for training RL models in four studies (4/14, 29%; Behr et al, 2019;Cho et al, 2022;Karstensen et al, 2022;Meng et al, 2022), the Unity engine (Unity Technologies, San Francisco, USA) was used in two studies (2/14, 14%; You et al, 2019;Meng et al, 2021), while the platform used for training was not specified in three…”