2021
DOI: 10.1002/psp4.12588
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement learning and Bayesian data assimilation for model‐informed precision dosing in oncology

Abstract: Model‐informed precision dosing (MIPD) using therapeutic drug/biomarker monitoring offers the opportunity to significantly improve the efficacy and safety of drug therapies. Current strategies comprise model‐informed dosing tables or are based on maximum a posteriori estimates. These approaches, however, lack a quantification of uncertainty and/or consider only part of the available patient‐specific information. We propose three novel approaches for MIPD using Bayesian data assimilation (DA) and/or reinforceme… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
36
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 32 publications
(37 citation statements)
references
References 39 publications
(85 reference statements)
0
36
0
1
Order By: Relevance
“…A possible way to deal with sparse rewards in the light of multiple goals is hindsight experience replay, where different learning episodes are replayed with different goals and the agent can derive reward signals regarding different outcomes [ 66 ]. In most applications of RL in healthcare, rewards are coded quantitatively rather than qualitatively, which can be useful for certain use cases where the outcome, in fact, is a metric variable (such as absolute neutrophile count [ 34 ]); however, it remains challenging when the outcome first has to be transformed or a priori model building has to be performed manually [ 29 ]. Alternatively, preference models can be used as a representation of qualitative feedback to rank the agent’s behavioral trajectories [ 67 , 68 ].…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…A possible way to deal with sparse rewards in the light of multiple goals is hindsight experience replay, where different learning episodes are replayed with different goals and the agent can derive reward signals regarding different outcomes [ 66 ]. In most applications of RL in healthcare, rewards are coded quantitatively rather than qualitatively, which can be useful for certain use cases where the outcome, in fact, is a metric variable (such as absolute neutrophile count [ 34 ]); however, it remains challenging when the outcome first has to be transformed or a priori model building has to be performed manually [ 29 ]. Alternatively, preference models can be used as a representation of qualitative feedback to rank the agent’s behavioral trajectories [ 67 , 68 ].…”
Section: Discussionmentioning
confidence: 99%
“…Their method ranks drug sensitivity prediction algorithms and recommends the optimal algorithms for a given drug–cell line pair in order to achieve optimal responses. To account for chemotherapy-associated toxicity, Maier et al [ 34 ] proposed an RL-based framework that is guided by absolute neutrophil counts for adjusting subsequent drug doses. Using simulated reinforcement trials [ 35 ], Zhao et al [ 36 ] applied Q-learning to stage IIIB/IV non-small cell lung cancer and reported optimized first and second treatment lines as well as optimal selection for initiating second-line therapy.…”
Section: Recent Studies Of Reinforcement Learning In Malignant Diseasementioning
confidence: 99%
“…Applications of ML to MIPD to date have found that ML models are often able to accurately estimate past drug exposure, 24,25 predict future drug exposure, 26–28 or select doses 29–32 . However, the improvement in accuracy from these earlier approaches comes at the expense of pharmacological interpretability and the ability to simulate patient response to alternative dosing regimens 24,33,34 .…”
Section: Discussionmentioning
confidence: 99%
“…Applications of ML to MIPD to date have found that ML models are often able to accurately estimate past drug exposure, 24 , 25 predict future drug exposure, 26 , 27 , 28 or select doses. 29 , 30 , 31 , 32 However, the improvement in accuracy from these earlier approaches comes at the expense of pharmacological interpretability and the ability to simulate patient response to alternative dosing regimens. 24 , 33 , 34 An advantage of the combination of ML and PK models as described here is that clinical decision making is augmented by ML while maintaining the ability to forecast patient PKs and extract mechanistic insight from PK parameter estimates.…”
Section: Discussionmentioning
confidence: 99%
“…As an example of high clinical relevance, we focus on paclitaxel causing neutropenia as the most frequent and life‐threatening toxicity in oncology. Models describing paclitaxel‐induced neutropenia build the basis for neutrophil‐guided MIPD to individualize chemotherapy dosing 18–21 . Since the publication of the gold‐standard model for neutropenia, 22 many model variants have been developed, which differ not only in parameter estimates, 23–26 but also in their structure 17,27–29 …”
Section: Introductionmentioning
confidence: 99%