“…The majority of research using RL in healthcare is in dynamic treatment regimes, where the goal is to develop effective treatment regimes that can dynamically adapt to the varying clinical states and improve the long-term outcomes for patients (Yu et al, 2019b). This includes DTR for diseases such as cancer (Zhao, Kosorok, & Zeng, 2009;Liu, Logan, Liu, Xu, Tang, & Wang, 2017), diabetes (Daskalaki, Scarnato, Diem, & Mougiakakou, 2010;Bothe, Dickens, Reichel, Tellmann, Ellger, Westphal, & Faisal, 2013;Daskalaki, Diem, & Mougiakakou, 2013), anemia (Malof & Gaweda, 2011;Escandell-Montero, Chermisi, Martinez-Martinez, Gomez-Sanchis, Barbieri, Soria-Olivas, Mari, Vila-Francés, Stopper, Gatti, et al, 2014), HIV (Parbhoo, 2014;Parbhoo, Bogojeska, Zazzi, Roth, & Doshi-Velez, 2017;Yu, Dong, Liu, & Ren, 2019a), mental illnesses (Paredes, Gilad-Bachrach, Czerwinski, Roseway, Rowan, & Hernandez, 2014;Pineau, Guez, Vincent, Panuccio, & Avoli, 2009), and DTR in critical care (Weng, Gao, He, Yan, & Szolovits, 2017;Petersen, Yang, Grathwohl, Cockrell, Santiago, An, & Faissol, 2018).…”