2022
DOI: 10.1007/s12065-022-00778-z
|View full text |Cite
|
Sign up to set email alerts
|

ORAD: a new framework of offline Reinforcement Learning with Q-value regularization

Abstract: This paper presents advanced techniques of training diffusion policies for offline reinforcement learning (RL). At the core is a mean-reverting stochastic differential equation (SDE) that transfers a complex action distribution into a standard Gaussian and then samples actions conditioned on the environment state with a corresponding reverse-time SDE, like a typical diffusion policy. We show that such an SDE has a solution that we can use to calculate the log probability of the policy, yielding an entropy regu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 34 publications
0
4
0
Order By: Relevance
“…[ 29 ] As for S 2p, we observed in Figure S16d (Supporting Information) that the four characteristic peaks of Ni–S and Co–S could be classified from low to high energies as 2p3/2, 2p1/2, SO 4 2− and satellite peaks. [ 30 ] In particular, contrary to monometallic sulfides, bimetallic sulfides (Figure 3i) were categorized from low to high energies into 2p3/2, 2p1/2, M‐S bonds (M = Ni /Co), and satellite peaks. [ 31 ] The higher‐energy S 2p1/2 characteristic peaks were derived from low‐coordinated sulfur vacancies.…”
Section: Resultsmentioning
confidence: 99%
“…[ 29 ] As for S 2p, we observed in Figure S16d (Supporting Information) that the four characteristic peaks of Ni–S and Co–S could be classified from low to high energies as 2p3/2, 2p1/2, SO 4 2− and satellite peaks. [ 30 ] In particular, contrary to monometallic sulfides, bimetallic sulfides (Figure 3i) were categorized from low to high energies into 2p3/2, 2p1/2, M‐S bonds (M = Ni /Co), and satellite peaks. [ 31 ] The higher‐energy S 2p1/2 characteristic peaks were derived from low‐coordinated sulfur vacancies.…”
Section: Resultsmentioning
confidence: 99%
“…The porous and thin hollow shell consisting of CoSe 2 ‐H‐YS facilitated the shortened ion diffusion and fast electrolyte penetration compared to the dense structured CoSe 2 ‐D‐CS. [ 48 ] CoSe 2 ‐H‐YS for SIBs exhibited reversible discharge capacities of 461, 434, 404, 376, 346, 301, 255, and 209 mA h g −1 at current densities of 0.1, 0.2, 0.5, 1.0, 2.0, 4.0, 6.0, and 8.0 A g −1 , respectively, CoSe 2 ‐H‐YS for KIBs exhibited reversible discharge capacities of 373, 348, 312, 271, 208, 154, 98, and 60 mA h g −1 at current densities of 0.1, 0.2, 0.5, 1.0, 2.0, 3.0, 4.0, and 5.0 A g −1 , respectively. In contrast, CoSe 2 ‐D‐CS exhibited low reversible capacities of 95 and 5 mA h g −1 at high current densities of 8.0 and 5.0 A g −1 for SIBs and KIBs, respectively.…”
Section: Resultsmentioning
confidence: 99%
“…Generally, as the vacancy diffusion is another important ion transport means except for interstitial diffusion, 39 vacancy engineering is an efficient tactic to construct vacancies and strengthen the ion transport rate through the rational optimization of the electronic distribution and adjustment of the intrinsic electrochemical activity of the active materials. [40][41][42] Particularly, constructing sulfur vacancies in metal suldes (so as to accelerate ion diffusion kinetics by inducing localized electron centers) has been put forward by researchers. [43][44][45][46] For example, Guo's group revealed that generating sulfur vacancies in Ni 3 S 2 nanosheets can decrease the band gap, increase the occurrence of richer active sites, and thus enhance electrical conductivity.…”
Section: Introductionmentioning
confidence: 99%