We introduce the technique of adaptive discretization to design efficient model-based episodic reinforcement learning algorithms in large (potentially continuous) state-action spaces. Our algorithm is based on optimistic one-step value iteration extended to maintain an adaptive discretization of the space. From a theoretical perspective, we provide worst-case regret bounds for our algorithm, which are competitive compared to the state-of-the-art model-based algorithms; moreover, our bounds are obtained via a modular proof technique, which can potentially extend to incorporate additional structure on the problem. From an implementation standpoint, our algorithm has much lower storage and computational requirements, due to maintaining a more efficient partition of the state and action spaces. We illustrate this via experiments on several canonical control problems, which shows that our algorithm empirically performs significantly better than fixed discretization in terms of both faster convergence and lower memory usage. Interestingly, we observe empirically that while fixed-discretization model-based algorithms vastly outperform their model-free counterparts, the two achieve comparable performance with adaptive discretization. 1
Optimizing Mobile Food Pantry Operations Under Demand Uncertainty Managing complex systems often involves making trade-offs between different objectives. A common example is seeking fairness guarantees in sequential resource allocation problems. For example, mobile food pantries are tasked with allocating resources under demand uncertainty with the goal of simultaneously minimizing inefficiency (leftover resources) and envy (deviations in allocations). In this work, we tackle a problem established from a partnership with the Food Bank of the Southern Tier in optimizing their mobile food-pantry operations. We provide an exact characterization of the achievable (envy, efficiency) pairs, showing that any algorithm achieving low envy must suffer from poor inefficiency and vice versa. We complement this exact characterization with a simple algorithm capable of achieving any desired point along the trade-off curve.
Matrix estimation or completion has served as a canonical mathematical model for recommendation systems. More recently, it has emerged as a fundamental building block for data analysis as a first step to denoise the observations and predict missing values. Since the dawn of e-commerce, similarity-based collaborative filtering has been used as a heuristic for matrix etimation. At its core, it encodes typical human behavior: you ask your friends to recommend what you may like or dislike. Algorithmically, friends are similar “rows” or “columns” of the underlying matrix. The traditional heuristic for computing similarities between rows has costly requirements on the density of observed entries. In “Iterative Collaborative Filtering for Sparse Matrix Estimation” by Christian Borgs, Jennifer T. Chayes, Devavrat Shah, and Christina Lee Yu, the authors introduce an algorithm that computes similarities in sparse datasets by comparing expanded local neighborhoods in the associated data graph: in effect, you ask friends of your friends to recommend what you may like or dislike. This work provides bounds on the max entry-wise error of their estimate for low rank and approximately low rank matrices, which is stronger than the aggregate mean squared error bounds found in classical works. The algorithm is also interpretable, scalable, and amenable to distributed implementation.
We consider the task of tensor estimation, i.e. estimating a low-rank 3-order n × n × n tensor from noisy observations of randomly chosen entries in the sparse regime. In the context of matrix (2-order tensor) estimation, a variety of algorithms have been proposed and analyzed in the literature including the popular collaborative filtering algorithm that is extremely well utilized in practice. However, in the context of tensor estimation, there is limited progress. No natural extensions of collaborative filtering are known beyond "flattening" the tensor into a matrix and applying standard collaborative filtering.As the main contribution of this work, we introduce a generalization of the collaborative filtering algorithm for the setting of tensor estimation and argue that it achieves sample complexity that (nearly) matches the conjectured lower bound on the sample complexity. Interestingly, our generalization uses the matrix obtained from the "flattened" tensor to compute similarity as in the classical collaborative filtering but by defining a novel "graph" using it. The algorithm recovers the tensor with meansquared-error (MSE) decaying to 0 as long as each entry is observed independently with probability p = Ω(n −3/2+ ) for any arbitrarily small > 0. It turns out that p = Ω(n −3/2 ) is the conjectured lower bound as well as "connectivity threshold" of graph considered to compute similarity in our algorithm.
Randomized experiments are widely used to estimate the causal effects of a proposed treatment in many areas of science, from medicine and healthcare to the physical and biological sciences, from the social sciences to engineering, and from public policy to the technology industry. Here we consider situations where classical methods for estimating the total treatment effect on a target population are considerably biased due to confounding network effects, i.e., the fact that the treatment of an individual may impact its neighbors’ outcomes, an issue referred to as network interference or as nonindividualized treatment response. A key challenge in these situations is that the network is often unknown and difficult or costly to measure. We assume a potential outcomes model with heterogeneous additive network effects, encompassing a broad class of network interference sources, including spillover, peer effects, and contagion. First, we characterize the limitations in estimating the total treatment effect without knowledge of the network that drives interference. By contrast, we subsequently develop a simple estimator and efficient randomized design that outputs an unbiased estimate with low variance in situations where one is given access to average historical baseline measurements prior to the experiment. Our solution does not require knowledge of the underlying network structure, and it comes with statistical guarantees for a broad class of models. Due to their ease of interpretation and implementation, and their theoretical guarantees, we believe our results will have significant impact on the design of randomized experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.