Improving Online Marketing Experiments with Drifting Multi-armed Bandits

Burtini, Giuseppe; Loeppky, Jason L.; Lawrence, Ramon

doi:10.5220/0005458706300636

Cited by 24 publications

(23 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Cheung et al (2019); Russac et al (2019) provided an extension of this setting in the linear bandits framework, and Chen et al (2019) provided a method hat is adaptive to V T . In Burtini et al (2015), a modification to the linear model of Thompson sampler is proposed and deals with a general non-stationary means but lacks theoretical supports. Combes and Proutiere (2014) considered nonstationary means, which are Lipschitz continuous and unimodal.…”

Section: Introductionmentioning

confidence: 99%

Generalized non-stationary bandits

Manegueu,

Carpentier,

2021

Preprint

View full text Add to dashboard Cite

In this paper, we study a non-stationary stochastic bandit problem, which generalizes the switching bandit problem. On top of the switching bandit problem (Case a), we are interested in three concrete examples: (b) the means of the arms are local polynomials, (c) the means of the arms are locally smooth, and (d) the gaps of the arms have a bounded number of inflexion points and where the highest arm mean cannot vary too much in a short range. These three settings are very different, but have in common the following: (i) the number of similarly-sized level sets of the logarithm of the gaps can be controlled, and (ii) the highest mean has a limited number of abrupt changes, and otherwise has limited variations. We propose a single algorithm in this general setting, that in particular solves in an efficient and unified way the four problems (a)-(d) mentioned.

show abstract

Section: Introductionmentioning

confidence: 99%

Generalized non-stationary bandits

Manegueu,

Carpentier,

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In real service, it is common to have a type of non-stationary environment, that is time-varying effect [3,7,8,9,10,11]. However, multi-armed bandit including thompson sampling is sensitive to this irregular condition in nature compared to A/B test, where sample sizes for all variants do not change during an experiment.…”

Section: Introductionmentioning

confidence: 99%

Odds-Ratio Thompson Sampling to Control for Time-Varying Effect

Kim,

Kim

2020

Preprint

View full text Add to dashboard Cite

Multi-armed bandit methods have been used for dynamic experiments particularly in online services. Among the methods, thompson sampling is widely used because it is simple but shows desirable performance [1,2]. Many thompson sampling methods for binary rewards use logistic model that is written in a specific parameterization. In this study, we reparameterize logistic model with odds ratio parameters. This shows that thompson sampling can be used with subset of parameters. Based on this finding, we propose a novel method, "Odds-ratio thompson sampling", which is expected to work robust to time-varying effect. Use of the proposed method in continuous experiment is described with discussing a desirable property of the method. In simulation studies, the novel method works robust to temporal background effect, while the loss of performance was only marginal in case with no such effect. Finally, using dataset from real service, we showed that the novel method would gain greater rewards in practical environment. Problem SettingMany multi-armed bandit applications adopt Batch Update, where arms are played multiple times, then, policy and related parameters are updated with aggregated rewards [2,6]. Batch update, which is sometimes called as delay update, is practical set up, because it requires much less computational resources than online or realtime update. There are many chances that the temporal effect changes concurrent with batch update, making reward probabilities change

show abstract

“…Since behavioral signals in the data evolve with time, predictors of this type are characterized by non-stationarities in their reward sequences. Algorithms that deal with this issue include switching and restless bandits [35,68,131].…”

Section: Introductionmentioning

confidence: 99%

Towards an ethical recommendation framework

Paraschakis

2017

2017 11th International Conference on Research Challenges in Information Science (RCIS)

View full text Add to dashboard Cite

Studies in Computer Science Faculty of Technology and SocietyMalmö University 1. Jevinger, Åse. Toward intelligent goods: characteristics, architectures and applications, 2014, Doctoral dissertation. 2. Dahlskog, Steve. Patterns and procedural content generation in digital games: automatic level generation for digital games using game design patterns, 2016, Doctoral dissertation. 3. Fabijan, Aleksander. Developing the right features: the role and impact of customer and product data in software product development, 2016, Licentiate thesis. 4. Paraschakis, Dimitris. Algorithmic and ethical aspects of recommender systems in e-commerce, 2018, Licentiate thesis.

show abstract

Improving Online Marketing Experiments with Drifting Multi-armed Bandits

Cited by 24 publications

References 4 publications

Generalized non-stationary bandits

Generalized non-stationary bandits

Odds-Ratio Thompson Sampling to Control for Time-Varying Effect

Towards an ethical recommendation framework

Contact Info

Product

Resources

About