Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence 2022
DOI: 10.24963/ijcai.2022/516
|View full text |Cite
|
Sign up to set email alerts
|

Model-Based Offline Planning with Trajectory Pruning

Abstract: Graph neural network (GNN) is popular now to solve the tasks in non-Euclidean space and most of them learn deep embeddings by aggregating the neighboring nodes. However, these methods are prone to some problems such as over-smoothing because of the single-scale perspective field and the nature of low-pass filter. To address these limitations, we introduce diffusion scattering network (DSN) to exploit high-order patterns. With observing the complementary relationship between multi-layer GNN and DSN, we propose … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 3 publications
0
6
0
Order By: Relevance
“…Value regularization methods [8,14,32] regularize the value function to assign low values on OOD actions. Uncertainty-based and model-based methods [1,30,[36][37][38] estimate the epistemic uncertainty from value functions or learned models to penalize OOD data. Finally, in-sample learning methods [3,9,31] learn the value function entirely within data to avoid directly querying the Q-function on OOD actions produced by policies.…”
Section: Related Workmentioning
confidence: 99%
“…Value regularization methods [8,14,32] regularize the value function to assign low values on OOD actions. Uncertainty-based and model-based methods [1,30,[36][37][38] estimate the epistemic uncertainty from value functions or learned models to penalize OOD data. Finally, in-sample learning methods [3,9,31] learn the value function entirely within data to avoid directly querying the Q-function on OOD actions produced by policies.…”
Section: Related Workmentioning
confidence: 99%
“…We, instead, aim at offering higherquality synthetic data for offline training via an uncertainty-based trajectory truncation method. The most relevant to our work are M2AC [33], MOReL [17], and MOPP [45]. MOReL requires the generated single sample (instead of the trajectory) at each step to lie in a safe region.…”
Section: Related Workmentioning
confidence: 99%
“…However, MBPO fails to resolve the issue of extrapolation error in the offline setting. Modern model-based offline RL methods usually leverage methods like uncertainty quantification [44,17], penalizing value function to enforce conservatism [43], planning [45], for learning meaningful policies from static logged data. Utilizing the learned dynamics model for offline data augmentation is also explored recently [38,27].…”
Section: Introductionmentioning
confidence: 99%
“…Incorporating the Dynamics Model. Model-based approaches have been widely adopted in RL to improve sample efficiency and shows good performance and generalization ability in recent offline RL studies [20,53,56]. In our work, we introduce a probabilistic dynamics model implemented using a neural network that outputs a Gaussian distribution over the difference between the current and next state, i.e., f (s |s, a) = N (s + µ θ f (s, a), Σ θ f (s, a)), where µ θ f (s, a) and Σ θ f (s, a) are the parameterized mean and diagonal covariance matrix.…”
Section: Discriminator-guided Model-based Imitation Learning (Dmil)mentioning
confidence: 99%
“…The sample efficiency requirement for offline IL methods reminds us of the success of model-based approaches in the online and offline RL domains [21,20,53,56]. Dynamics models learned from the data can greatly supplement the limited expert data to improve state-action space coverage, leading to potentially improved policy performance and generalizability [20,56,15,5]. However, adopting a model-based approach in offline IL is still an underexplored area [15,5,37].…”
Section: Introductionmentioning
confidence: 99%