“…The restless feature is essential for capturing the risk-time tradeoff but makes the model complicated (see, e.g., Fryer and Harms, 2019). The thinking arm concept is related to the few papers in the literature that study multistage bandits (Keller and Oldale, 2003;Hu, 2014;Green and Taylor, 2016;Wolf, 2018;Kim, 2021;Moroni, 2021).…”