2022
DOI: 10.48550/arxiv.2201.13259
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Trajectory balance: Improved credit assignment in GFlowNets

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
11
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(12 citation statements)
references
References 0 publications
1
11
0
Order By: Relevance
“…(22) PB(s | s ′ ; θ)F (s; θ) Several alternative learning objectives for GFlowNets have been proposed, especially for longer trajectories to sample the object y (Malkin et al, 2022;Madan et al, 2022). Trajectory Balance (Malkin et al, 2022, TB) is a prominent learning objective for training GFlowNets.…”
Section: Multiple Parent Statesmentioning
confidence: 99%
“…(22) PB(s | s ′ ; θ)F (s; θ) Several alternative learning objectives for GFlowNets have been proposed, especially for longer trajectories to sample the object y (Malkin et al, 2022;Madan et al, 2022). Trajectory Balance (Malkin et al, 2022, TB) is a prominent learning objective for training GFlowNets.…”
Section: Multiple Parent Statesmentioning
confidence: 99%
“…Since in the PRL framework, with the update of the 𝑄 soft , the energy-based policy distribution is also constantly changing. DPO adopts a joint training framework where the EBM and the GFlowNet are optimized alternately, similar with [99]: The energy function serves as the negative log-reward function for the GFlowNet, which is trained with the trajectory balance [54] objective to sample from the evolving energy-based policies. In contrast, the energy function is trained with soft Bellman backup, where the GFlowNet provides diverse samples.…”
Section: Diverse Policy Optimizationmentioning
confidence: 99%
“…To train the parameters 𝜃 𝐹 and 𝜃 𝐵 of the reward-conditional GFlowNet, we use the trajectory balance objective [54] that optimizes the following objective along complete trajectories 𝜏 = (𝑠…”
Section: Reward-conditional Gflownet Trainingmentioning
confidence: 99%
See 2 more Smart Citations