Reda Ouhamma scite author profile

Reda Ouhamma

2Publications

7Citation Statements Received

44Citation Statements Given

How they've been cited

How they cite others

Affiliations

École Centrale de Lille

Publications

Order By: Most citations

Stochastic Online Linear Regression: the Forward Algorithm to Replace Ridge

Ouhamma¹,

Maillard²,

Perchet³

2021

Preprint

View full text Add to dashboard Cite

We consider the problem of online linear regression in the stochastic setting. We derive high probability regret bounds for online ridge regression and the forward algorithm. This enables us to compare online regression algorithms more accurately and eliminate assumptions of bounded observations and predictions. Our study advocates for the use of the forward algorithm in lieu of ridge due to its enhanced bounds and robustness to the regularization parameter. Moreover, we explain how to integrate it in algorithms involving linear function approximation to remove a boundedness assumption without deteriorating theoretical bounds. We showcase this modification in linear bandit settings where it yields improved regret bounds. Last, we provide numerical experiments to illustrate our results and endorse our intuitions.Preprint. Under review.

show abstract

Bilinear Exponential Family of MDPs: Frequentist Regret Bound with Tractable Exploration & Planning

Ouhamma

Basu

Maillard

2023

AAAI

View full text Add to dashboard Cite

We study the problem of episodic reinforcement learning in continuous state-action spaces with unknown rewards and transitions. Specifically, we consider the setting where the rewards and transitions are modeled using parametric bilinear exponential families. We propose an algorithm, that a) uses penalized maximum likelihood estimators to learn the unknown parameters, b) injects a calibrated Gaussian noise in the parameter of rewards to ensure exploration, and c) leverages linearity of the bilinear exponential family transitions with respect to an underlying RKHS to perform tractable planning. We provide a frequentist regret upper-bound for our algorithm which, in the case of tabular MDPs, is order-optimal with respect to H and K, where H is the episode length and K is the number of episodes. Our analysis improves the existing bounds for the bilinear exponential family of MDPs by square root of H and removes the handcrafted clipping deployed in existing RLSVI-type algorithms.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Reda Ouhamma

Stochastic Online Linear Regression: the Forward Algorithm to Replace Ridge

Bilinear Exponential Family of MDPs: Frequentist Regret Bound with Tractable Exploration & Planning

Contact Info

Product

Resources

About