Planning from Pixels in Atari with Learned Symbolic Representations

Dittadi, Andrea; Drachmann, Frederik K.; Bolander, Thomas

doi:10.1609/aaai.v35i6.16627

Cited by 4 publications

(11 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…VAE-IW (Dittadi, Drachmann, and Bolander 2021) extends RIW to learn encodings from screen images. In a training stage, the game is run using RIW until a fixed number of screens are encountered and saved.…”

Section: Vae-iwmentioning

confidence: 99%

“…Learning the features from data permits the encoding to be tailored to a specific game, but generating a faithful encoding of a game is non-trivial, and using a static dataset is typically insufficient. For example, Dittadi, Drachmann, and Bolander (2021) save screens that are reached by a Rollout-IW agent using a hand-coded B-PROST feature set, and use those screens to train a Binary-Concrete VAE to produce a game-specific encoding. However, in order for the encoding to be representative of a game, the dataset must include screens from all visually distinct parts, such as separate levels.…”

Section: Online Representation Learning For Atarimentioning

confidence: 99%

“…For every configuration, we trained the system with 5 different random seeds, then evaluated the result for 10 episodes (playthroughs) for 50 evaluations total. Following existing work (Junyent, Jonsson, and Gómez 2019;Junyent, Gómez, and Jonsson 2021;Dittadi, Drachmann, and Bolander 2021), we use a discount factor of γ = 0.99 (rather than 0.995 in (Bandres, Bonet, and Geffner 2018)) in line 28. However, note that the reported final scores are undiscounted sums of rewards.…”

Section: Empirical Evaluationsmentioning

confidence: 99%

“…Resource Constraints: We repeated the setup of previous work (Lipovetzky and Geffner 2012;Bandres, Bonet, and Geffner 2018;Dittadi, Drachmann, and Bolander 2021) where each action is held for 15 frames, which we refer to as taking a single action. We define a single simulator call as an update to the simulator when taking a single action.…”

Section: Empirical Evaluationsmentioning

confidence: 99%

“…π-IW discretizes the last hidden layer of a policy function as the input. Recently, VAE-IW (Dittadi, Drachmann, and Bolander 2021) obtains a compact binary representation by training a Binary-Concrete VAE offline.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Is Policy Learning Overrated?: Width-Based Planning and Active Learning for Atari

Ayton

Asai

2022

ICAPS

View full text Add to dashboard Cite

Width-based planning has shown promising results on Atari 2600 games using pixel input, while using substantially fewer environment interactions than reinforcement learning. Recent width-based approaches have computed feature vectors for each screen using a hand designed feature set (Rollout-IW) or a variational autoencoder trained on game screens (VAE-IW), and prune screens that do not have novel features during the search. We propose Olive (Online-VAE-IW), which updates the VAE features online using active learning to maximize the utility of screens observed during planning. Experimental results across 55 Atari games demonstrate that it outperforms Rollout-IW by 42-to-11 and VAE-IW by 32-to-20. Moreover, Olive outperforms existing work based on policy-learning (π-IW, DQN) trained with 100 times the training budget by 30-to-22 and 31-to-17, and a state of the art data-efficient reinforcement learning (EfficientZero) trained with the same training budget and ran with 1.8 times the planning budget by 18-to-7 in the Atari 100k benchmark, without any policy learning. The source code and the appendix are available at github.com/ibm/atari-active-learning and arxiv.org/abs/2109.15310 .

show abstract

Section: Vae-iwmentioning

confidence: 99%

Section: Online Representation Learning For Atarimentioning

confidence: 99%