The Multi-Armed Bandit (MAB) problem has been extensively studied in order to address real-world challenges related to sequential decision making. In this setting, an agent selects the best action to be performed at time-step t, based on the past rewards received by the environment. This formulation implicitly assumes that the expected payoff for each action is kept stationary by the environment through time. Nevertheless, in many real-world applications this assumption does not hold and the agent has to face a non-stationary environment, that is, with a changing reward distribution. Thus, we present a new MAB algorithm, named f-Discounted-Sliding-Window Thompson Sampling (f-dsw TS), for non-stationary environments, that is, when the data streaming is affected by concept drift. The f-dsw TS algorithm is based on Thompson Sampling (TS) and exploits a discount factor on the reward history and an arm-related sliding window to contrast concept drift in non-stationary environments. We investigate how to combine these two sources of information, namely the discount factor and the sliding window, by means of an aggregation function f(.). In particular, we proposed a pessimistic (f=min), an optimistic (f=max), as well as an averaged (f=mean) version of the f-dsw TS algorithm. A rich set of numerical experiments is performed to evaluate the f-dsw TS algorithm compared to both stationary and non-stationary state-of-the-art TS baselines. We exploited synthetic environments (both randomly-generated and controlled) to test the MAB algorithms under different types of drift, that is, sudden/abrupt, incremental, gradual and increasing/decreasing drift. Furthermore, we adapt four real-world active learning tasks to our framework—a prediction task on crimes in the city of Baltimore, a classification task on insects species, a recommendation task on local web-news, and a time-series analysis on microbial organisms in the tropical air ecosystem. The f-dsw TS approach emerges as the best performing MAB algorithm. At least one of the versions of f-dsw TS performs better than the baselines in synthetic environments, proving the robustness of f-dsw TS under different concept drift types. Moreover, the pessimistic version (f=min) results as the most effective in all real-world tasks.
Recommender Systems were created to support users in situations of information overload. However, users are consciously or unconsciously influenced by several factors in their decision-making. We analysed a historical dataset from a meta-search booking platform with the aim of exploring how these factors influence user choices in the context of online hotel search and booking. Specifically, we focused our study on the influence of (i) ranking position, (ii) number of reviews, (iii) average ratings and (iv) price when analysing users’ click behaviour. Our results confirmed conventional wisdom that position and price were the “two elephants in the room” heavily influencing user decision-making. Thus, they need to be taken into account when, for instance, trying to learn user preferences from clickstream data. Using the results coming from this analysis, we performed an online A/B test on this meta-search booking platform comparing the current policy with a price-based re-rank policy. Our online experiments suggested that, although in offline experiments items with lower prices tend to have a higher Click-Through Rate, in an online context a price-based re-rank was only capable to improve the Click-Through Rate metric for the first positions of the recommended lists.
Reproducibility is a main principle in science and fundamental to ensure scientific progress. However, many recent works point out that there are widespread deficiencies for this aspect in the AI field, making the reproducibility of results impractical or even impossible. We therefore studied the state of reproducibility support on the topic of Reinforcement Learning & Recommender Systems to analyse the situation in this context. We collected a total of 60 papers and analysed them by defining a set of variables to inspect the most important aspects that enable reproducibility, such as: dataset, pre-processing code, hardware specifications, software dependencies, algorithm implementation, algorithm hyperparameters and experiment code. Furthermore, we used the ACM Badges definitions assigning them to the selected papers. We discovered that, like in many other AI domains, the Reinforcement Learning & Recommender Systems field is grappling with a reproducibility crisis, as none of the selected papers were reproducible when strictly applying the ACM Badges definitions according to our analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.