Advait Parulekar scite author profile

Advait Parulekar

3Publications

13Citation Statements Received

17Citation Statements Given

How they've been cited

How they cite others

Affiliations

The University of Texas at Austin

Publications

Order By: Most citations

Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

Vial

Parulekar²,

Shakkottai³

et al. 2021

Preprint

View full text Add to dashboard Cite

We propose two algorithms for episodic stochastic shortest path problems with linear function approximation. The first is computationally expensive but provably obtains Õ( B 3 d 3 K/c min ) regret, where B is a (known) upper bound on the optimal cost-to-go function, d is the feature dimension, K is the number of episodes, and c min is the minimal cost of non-goal state-action pairs (assumed to be positive). The second is computationally efficient in practice, and we conjecture that it obtains the same regret bound. Both algorithms are based on an optimistic least-squares version of value iteration analogous to the finite-horizon backward induction approach from Jin et al. [2020]. To the best of our knowledge, these are the first regret bounds for stochastic shortest path that are independent of the size of the state and action spaces.

show abstract

Locating Conical Degeneracies in the Spectra of Parametric Self-adjoint Matrices

Berkolaiko¹,

Parulekar²

2021

SIAM J. Matrix Anal. Appl.

View full text Add to dashboard Cite

Improved Algorithms for Misspecified Linear Markov Decision Processes

Vial

Parulekar²,

Shakkottai³

et al. 2021

Preprint

View full text Add to dashboard Cite

For the misspecified linear Markov decision process (MLMDP) model of Jin et al. [2020], we propose an algorithm with three desirable properties. (P1) Its regret after K episodes scales as K max{ε mis , ε tol }, where ε mis is the degree of misspecification and ε tol is a user-specified error tolerance. (P2) Its space and per-episode time complexities remain bounded as K → ∞. (P3) It does not require ε mis as input. To our knowledge, this is the first algorithm satisfying all three properties. For concrete choices of ε tol , we also improve existing regret bounds (up to log factors) while achieving either (P2) or (P3) (existing algorithms satisfy neither). At a high level, our algorithm generalizes (to MLMDPs) and refines the Sup-Lin-UCB algorithm, which Takemura et al. [2021] recently showed satisfies (P3) in the contextual bandit setting.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.