Rate-Optimal Contextual Online Matching Bandit

Li, Yuantong; Wang, Chi-Hua; Cheng, Guang; Sun, Will Wei

doi:10.48550/arxiv.2205.03699

2022

DOI: 10.48550/arxiv.2205.03699

|View full text |Cite

Preprint

Rate-Optimal Contextual Online Matching Bandit

Yuantong Li¹,

Chi-Hua Wang²,

Guang Cheng³

et al.

Abstract: Two-sided online matching platforms have been employed in various markets. However, agents' preferences in the present market are usually implicit and unknown, and thus must be learned from data. With the growing availability of side information involved in the decision process, modern online matching methodology demands the capability to track preference dynamics for agents based on the contextual information. This motivates us to consider a novel Contextual Online Matching Bandit prOblem (COMBO), which allow… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2023

Publication Types

Select...

Article1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

(1 citation statement)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, there is a growing line of research in the statistics literature for policy learning and/or evaluation in infinite horizons. Some references include Chen et al (2022), Ertefaie and Strawderman (2018), Liao et al (2020), Liao et al (2021), Li et al (2022), Luckett et al (2020), Ramprasad et al (2022), Shi et al (2022, and Xu et al (2020). In the computer science literature, there is a huge literature on developing reinforcement learning (RL) algorithms in infinite horizons.…”

mentioning

confidence: 99%

Deep spectral Q‐learning with application to mobile health

Gao

Shi

Song

2023

Stat

View full text Add to dashboard Cite

Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time‐varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q‐learning algorithm, which integrates principal component analysis (PCA) with deep Q‐learning to handle the mixed frequency data. In theory, we prove that the mean return under the estimated optimal policy converges to that under the optimal one and establish its rate of convergence. The usefulness of our proposal is further illustrated via simulations and an application to a diabetes dataset.

show abstract

mentioning

confidence: 99%

Deep spectral Q‐learning with application to mobile health

Gao

Shi

Song

2023

Stat

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Rate-Optimal Contextual Online Matching Bandit

Cited by 1 publication

References 33 publications

Deep spectral Q‐learning with application to mobile health

Deep spectral Q‐learning with application to mobile health

Contact Info

Product

Resources

About