2022
DOI: 10.1007/978-3-031-20627-6_11
|View full text |Cite
|
Sign up to set email alerts
|

Learning Optimal Treatment Strategies for Sepsis Using Offline Reinforcement Learning in Continuous Space

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 11 publications
0
8
0
Order By: Relevance
“…BooT [74] s-a-r-R data augmentation, decoder D4RL teacher-forcing strategy [142] state-marginal CDT [143] s-a matching decoder Gym, MuJoCo BDT [143] s-a decoder Gym, MuJoCo h-a (SLT) Scene-Rep [144] s (MST) SAC encoder-decoder SMARTS [145] SPLT [146] s-a encoder-decoder CARLA [147] Trans-REIN [148] s REINFORCE encoder-decoder TSP instance STT [149] s -encoder Gym, Mujoco, CausalWorld [150] Catformer [151] -R2D2 [152] decoder Quake III Arena Engine GTrXL [153] s decoder DMLab-30, Memory Maze Gambling, Connect Four, ESPER [154] s-a encoder 2048 [155] Meta-RL AdA [174] o-a-r auto-curriculum learning encoder XLand 2.0 decoder Transformer structure (shown in Fig. 11) to enhance the parallelization for proximal policy optimization (PPO).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…BooT [74] s-a-r-R data augmentation, decoder D4RL teacher-forcing strategy [142] state-marginal CDT [143] s-a matching decoder Gym, MuJoCo BDT [143] s-a decoder Gym, MuJoCo h-a (SLT) Scene-Rep [144] s (MST) SAC encoder-decoder SMARTS [145] SPLT [146] s-a encoder-decoder CARLA [147] Trans-REIN [148] s REINFORCE encoder-decoder TSP instance STT [149] s -encoder Gym, Mujoco, CausalWorld [150] Catformer [151] -R2D2 [152] decoder Quake III Arena Engine GTrXL [153] s decoder DMLab-30, Memory Maze Gambling, Connect Four, ESPER [154] s-a encoder 2048 [155] Meta-RL AdA [174] o-a-r auto-curriculum learning encoder XLand 2.0 decoder Transformer structure (shown in Fig. 11) to enhance the parallelization for proximal policy optimization (PPO).…”
Section: Methodsmentioning
confidence: 99%
“…More details will be introduced in Sec.IV-A. Wang et al [74] also provided a perspective by viewing the offline RL as a generic sequence generation problem, adopting Transformer architecture to model distributions over trajectories.…”
Section: A Transformer-based Offline Rlmentioning
confidence: 99%
“…Currently, researchers use DRL models, such as Proximal Policy Optimization (PPO), to find the optimal contracts. Inspired by diffusion Q-learning [15], we integrate the diffusion process in conventional DRL to improve its flexibility and exploration ability. To help readers understand the fourstep workflow mentioned in Section III-D, we elaborate our diffusion-empowered contract generation step-by-step.…”
Section: Diffusion-empowered Contract Generationmentioning
confidence: 99%
“…• Step 2: Explore the policy in latent space. We use a conditional diffusion model to exploit the latent space of contract generation [15]. Specifically, we need to learn the policy π(c 0 |s) for generating the optimal contract c 0 under the state s ∈ {S}.…”
Section: Diffusion-empowered Contract Generationmentioning
confidence: 99%
See 1 more Smart Citation