“…Optimal (sequential) resource allocation is a well known problem in operation research, with traditional examples such as inventory management, portfolio allocation, etc. [1,2,3,4]. Some approaches are related to stochastic optimization (e.g., multi-period and multi-stage stochastic optimization [5,6]), but the most relevant literature for this paper is from Multi Armed Bandit (MAB) [7,8], specifically the semi-bandit feedback (SBF) [9,10,11,12,13,14].…”