“…For this reason, it was used in multiple other works on derivatives pricing and hedging. Various techniques were considered such as Q-learning in Halperin (2020) and Cao et al (2021), proximal policy optimization in Chong et al (2021), least squares policy iteration and fitted Q-iteration for American option pricing in Li et al (2009), or batch policy gradient in Buehler et al (2019). Moreover, various other financial problems were tackled through reinforcement learning procedures in the literature, for instance portfolio management as in Moody and Wu (1997), Jiang et al (2017), Pendharkar and Cusatis (2018), García-Galicia et al (2019), Wang and Zhou (2020), Ye et al (2020) and Betancourt and Chen (2021), optimal liquidation, see Bao and Liu (2019), or trading optimization as in Hendricks and Wilcox (2014), Lu (2017) and Ning et al (2018).…”