“…RL does not require any data; consequently the learning environment plays a crucial role in providing enough and diverse experiences (Jaderberg et al, 2021) in conjunction with carefully crafted reward signals. Many domains are interested in RL research and applications (Leitão & Karnouskos, 2015), such as game theory and distributed systems, but also optimal control, autonomous cars (Shalev-Shwartz et al, 2016) and robotics (Kober et al, 2013;Gupta et al, 2017;Ismail & Sariff, 2018). Games have been part of one of three main historical threads of RL development (Sutton & Barto, 2015).…”