2020
DOI: 10.48550/arxiv.2012.11401
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Universal Policies for Software-Defined MDPs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 0 publications
0
3
0
Order By: Relevance
“…We train a teacher agent and a solver agent in sequence using the (Gumbel) AlphaZero algorithm. Both agents use a similar 1.6M parameters Dynamic Graph Transformer network [30] (see architecture details in Appendix H). The teacher agent is trained first for 20 AlphaZero iterations.…”
Section: Training Protocolmentioning
confidence: 99%
See 2 more Smart Citations
“…We train a teacher agent and a solver agent in sequence using the (Gumbel) AlphaZero algorithm. Both agents use a similar 1.6M parameters Dynamic Graph Transformer network [30] (see architecture details in Appendix H). The teacher agent is trained first for 20 AlphaZero iterations.…”
Section: Training Protocolmentioning
confidence: 99%
“…One proposal [30] is to use supervised learning for training a universal oracle that is conditioned on the description of arbitrary search problems and can therefore use training data from a large number of heterogeneous sources. In contrast, we propose using reinforcement learning with the insight that nondeterministic search strategies can be used to generate problems in addition to solving them.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation