“…Batch reinforcement learning in both the tabular and functional approximator settings has long been studied (Lange et al, 2012;Strehl et al, 2010) and continues to be a highly active area of research (Swaminathan & Joachims, 2015;Jiang & Li, 2015;Thomas & Brunskill, 2016;Farajtabar et al, 2018;Irpan et al, 2019;Jaques et al, 2019). Imitation learning is also a well-studied problem (Schaal, 1999;Argall et al, 2009;Hussein et al, 2017) and also continues to be a highly active area of research (Kim et al, 2013;Piot et al, 2014;Chemali & Lazaric, 2015;Hester et al, 2018;Ho et al, 2016;Sun et al, 2017;Cheng et al, 2018;Gao et al, 2018). This paper relates most closely to (Fujimoto et al, 2018a), which made the critical observation that when conventional DQL-based algorithms are employed for batch reinforcement learning, performance can be very poor, with the algorithm possibly not learning at all.…”