“…Several papers have proposed multi-armed bandit models where surrogate outcomes encode actions' longterm impacts. These include bandit models where poor recommendations cause attrition [Ben-Porat et al, 2022, Bastani et al, 2022 and bandit models where objectives incorporate diversity/boredom considerations [Xie et al, 2022, Cao et al, 2020, Ma et al, 2016. Wu et al [2017] studies a variation on typical bandit model where actions impact whether a user will return to the system.…”