“…Examples of reward signals include portfolio return (Jiang et al., 2017; Pendharkar & Cusatis, 2018; Yu et al., 2019), (differential) Sharpe ratio (Du et al., 2016; Pendharkar … Cusatis, 2018), and profit (Du et al., 2016). The benchmark strategies include Constantly Rebalanced Portfolio (CRP) (Yu et al., 2019; Jiang et al., 2017) where at each period the portfolio is rebalanced to the initial wealth distribution among the assets, and the buy‐and‐hold or do‐nothing strategy (Park et al., 2020; Aboussalah, 2020), which does not take any action but rather holds the initial portfolio until the end. The performance measures studied in these papers include the Sharpe ratio (Yu et al., 2019; Wang … Zhou, 2020; Xiong et al., 2018; Jiang et al., 2017; Liang et al., 2018; Park et al., 2020; Wang, 2019), the Sortino ratio (Yu et al., 2019), portfolio returns (Aboussalah, 2020; Liang et al., 2018; Park et al., 2020; Wang, 2019; Xiong et al., 2018; Yu et al., 2019), portfolio values (Jiang et al., 2017; Pendharkar & Cusatis, 2018; Xiong et al., 2018), and cumulative profits (Du et al., 2016).…”