“…It then becomes of interest to study policies which prescribe actions that maximize the reward over all interactions (e.g., max P t ¼ T t ¼ 1 R t jA t ; X t ). There is a broad literature on the topic (see, e.g., Berry and Fristedt, 1985;Audibert et al, 2009;Scott, 2010), and within the marketing literature researchers are already approaching the personalization problem as a contextual bandit problem (Hauser et al, 2009(Hauser et al, , 2014Schwartz et al, 2013). Appreciation of the inherent uncertainty in the coupling between user, message content, and observed behavior by exploring different policies is, in our view, a key next step for the development of personalized persuasive systems.…”