“…Furthermore, OIM falls in the Combinatorial Multi-Armed Bandit (CMAB) regime, where each arm is a combination of multiple selected items. CMAB has found widespread applications in the real-world, such as assortment optimization (Agrawal et al 2019, Oh and Iyengar 2021, and product ranking (Chen et al 2020). Most CMAB models assume semi-bandit feedback, where the agent can observe the outcome of every item within the selected arm (Gai et al 2012, Chen et al 2013, Kveton et al 2015c, Chen et al 2016b, Wang and Chen 2017, or partial-feedback, where the agent can only observe a subset of the selected items (Kveton et al 2015b, Combes et al 2015, Katariya et al 2016, Zong et al 2016, Cheung et al 2019.…”