Yinzhao Dong scite author profile

Background: Reinforcement learning (RL) provides a promising technique to solve complex sequential decision making problems in healthcare domains. Recent years have seen a great progress of applying RL in addressing decision-making problems in Intensive Care Units (ICUs). However, since the goal of traditional RL algorithms is to maximize a long-term reward function, exploration in the learning process may have a fatal impact on the patient. As such, a short-term goal should also be considered to keep the patient stable during the treating process. Methods: We use a Supervised-Actor-Critic (SAC) RL algorithm to address this problem by combining the long-term goal-oriented characteristics of RL with the short-term goal of supervised learning. We evaluate the differences between SAC and traditional Actor-Critic (AC) algorithms in addressing the decision making problems of ventilation and sedative dosing in ICUs. Results: Results show that SAC is much more efficient than the traditional AC algorithm in terms of convergence rate and data utilization. Conclusions: The SAC algorithm not only aims to cure patients in the long term, but also reduces the degree of deviation from the strategy applied by clinical doctors and thus improves the therapeutic effect.

show abstract

Incorporating causal factors into reinforcement learning for dynamic treatment regimes in HIV

Dong

Liu

et al. 2019

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

Background Reinforcement learning (RL) provides a promising technique to solve complex sequential decision making problems in health care domains. However, existing studies simply apply naive RL algorithms in discovering optimal treatment strategies for a targeted problem. This kind of direct applications ignores the abundant causal relationships between treatment options and the associated outcomes that are inherent in medical domains. Methods This paper investigates how to integrate causal factors into an RL process in order to facilitate the final learning performance and increase explanations of learned strategies. A causal policy gradient algorithm is proposed and evaluated in dynamic treatment regimes (DTRs) for HIV based on a simulated computational model. Results Simulations prove the effectiveness of the proposed algorithm for designing more efficient treatment protocols in HIV, and different definitions of the causal factors could have significant influence on the final learning performance, indicating the necessity of human prior knowledge on defining a suitable causal relationships for a given problem. Conclusions More efficient and robust DTRs for HIV can be derived through incorporation of causal factors between options of anti-HIV drugs and the associated treatment outcomes.

show abstract

D3PG: Decomposed Deep Deterministic Policy Gradient for Continuous Control

Dong

2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yinzhao Dong

Distributed multi‐agent deep reinforcement learning for cooperative multi‐robot pursuit

Supervised-actor-critic reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units

Incorporating causal factors into reinforcement learning for dynamic treatment regimes in HIV

D3PG: Decomposed Deep Deterministic Policy Gradient for Continuous Control

Contact Info

Product

Resources

About