We investigate the piecewise-stationary combinatorial semi-bandit problem. Compared to the original combinatorial semi-bandit problem, our setting assumes the reward distributions of base arms may change in a piecewise-stationary manner at unknown time steps. We propose an algorithm, GLR-CUCB, which incorporates an efficient combinatorial semi-bandit algorithm, CUCB, with an almost parameter-free change-point detector, the Generalized Likelihood Ratio Test (GLRT). Our analysis shows that the regret of GLR-CUCB is upper bounded by O(√NKT log T), where N is the number of piecewise-stationary segments, K is the number of base arms, and T is the number of time steps. As a complement, we also derive a nearly matching regret lower bound on the order of Ω(√NKT), for both piecewise-stationary multi-armed bandits and combinatorial semi-bandits, using information-theoretic techniques and judiciously constructed piecewise-stationary bandit instances. Our lower bound is tighter than the best available regret lower bound, which is Ω(√T). Numerical experiments on both synthetic and real-world datasets demonstrate the superiority of GLR-CUCB compared to other state-of-the-art algorithms.
We investigate the piecewise-stationary combinatorial semi-bandit problem. Compared to the original combinatorial semi-bandit problem, our setting assumes the reward distributions of base arms may change in a piecewise-stationary manner at unknown time steps. We propose an algorithm, GLR-CUCB, which incorporates an efficient combinatorial semi-bandit algorithm, CUCB, with an almost parameter-free change-point detector, the Generalized Likelihood Ratio Test (GLRT). Our analysis shows that the regret of GLR-CUCB is upper bounded by O( √ N KT log T ), where N is the number of piecewise-stationary segments, K is the number of base arms, and T is the number of time steps. As a complement, we also derive a nearly matching regret lower bound on the order of Ω( √ N KT ), for both piecewise-stationary multi-armed bandits and combinatorial semi-bandits, using information-theoretic techniques and judiciously constructed piecewisestationary bandit instances. Our lower bound is tighter than the best available regret lower bound, which is Ω( √ T ). Numerical experiments on both synthetic and real-world datasets demonstrate the superiority of GLR-CUCB compared to other state-of-the-art algorithms.
Cascading bandit (CB) is a popular model for web search and online advertising. However, the stationary CB model may be too simple to cope with real-world problems, where user preferences may change over time. Considering piecewisestationary environments, two efficient algorithms, GLRT-Ca-scadeUCB and GLRT-CascadeKL-UCB, are developed. Comparing with existing works, the proposed algorithms: i) are free of change-point-dependent information for choosing parameters; ii) have fewer tuning parameters; iii) improve regret upper bounds. We also show that the proposed algorithms are optimal up to logarithm terms by deriving a minimax lower bound Ω( √ N LT ) for piecewise-stationary CB. The efficiency of the proposed algorithms is validated through numerical tests on a real-world benchmark dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.