“…Arguably, a fundamental bottleneck for pretraining in RL is the difficulty in reusing a single network across vastly different tasks, of distinct observation spaces, action spaces, rewards, scenes, and agent morphologies. Preliminary work explored various aspects of this problem through graph neural networks for morphology generalization (Wang et al, 2018b;Pathak et al, 2019;Chen et al, 2018;Kurin et al, 2020), language for universal reward specification (Jiang et al, 2019;Lynch & Sermanet, 2021;Shridhar et al, 2022), and object-centric action spaces (Zeng et al, 2020;Shridhar et al, 2022;Noguchi et al, 2021). Our work is orthogonal to these as we essentially amortize RL algorithm itself, expressed as sequence modeling with Transformer, instead of specific RL domain information, and can be combined with domain-specific pre-training techniques (Yen- Chen et al, 2020;Lynch & Sermanet, 2021) effortlessly.…”