“…Applications in medicine, public policy, internet marketing, and other scientific areas often require estimating an individualized treatment rule (or regime, policy) to maximize the potential benefit. Several successful methods have been developed for estimating an optimal treatment regime, including Q‐learning (Watkins and Dayan, 1992; Murphy, 2005b; Chakraborty et al ., 2010; Qian and Murphy, 2011; Song et al ., 2015), A‐learning (Robins et al ., 2000; Murphy, 2003, 2005a; Moodie and Richardson, 2010; Shi et al ., 2018), model‐free methods (Robins et al ., 2008; Orellana et al ., 2010; Zhang et al ., 2012; Zhao et al ., 2012, 2015; Athey and Wager, 2017; Linn et al ., 2017; Zhou et al ., 2017; Zhu et al ., 2017; Lou et al ., 2018; Qi et al ., 2018; Wang et al ., 2018), tree or list‐based methods (Laber and Zhao, 2015; Cui et al ., 2017; Zhu et al ., 2017; Zhang et al ., 2018), targeted learning ensembles approach (Díaz et al ., 2018), among others.…”