A Systematic Survey of Text Worlds as Embodied Natural Language Environments

Jansen, Peter

doi:10.18653/v1/2022.wordplay-1.1

Cited by 1 publication

(1 citation statement)

References 69 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Text Games and Virtual Environments: Interactive text environments are an attractive choice for studying embodied agents, owing to their relative simplicity compared to full 3D simulations and ability to model complex and abstract tasks (Jansen, 2021;Li et al, 2021). While early text game research focused on testing agents on a small set of extant "interactive fiction" games like Zork, recent approaches have leaned towards procedurally generating a wider set of simple text-based games in order to evaluate agents' ability to generalize (Côté et al, 2018;Shridhar et al, 2020;Wang et al, 2022).…”

Section: Related Workmentioning

confidence: 99%

ByteSized32: A Corpus and Challenge Task for Generating Task-Specific World Models Expressed as Text Games

Wang,

Todd,

Yuan

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

In this work we investigate the capacity of language models to generate explicit, interpretable, and interactive world models of scientific and common-sense reasoning tasks. We operationalize this as a task of generating text games, expressed as hundreds of lines of PYTHON code. To facilitate this task, we introduce BYTESIZED32 1 , a corpus of 32 reasoning-focused text games totalling 20k lines of PYTHON code. We empirically demonstrate that GPT-4 can use these games as templates for single-shot in-context learning, successfully producing runnable games on unseen topics in 28% of cases. When allowed to selfreflect on program errors, game runnability substantially increases to 57%. While evaluating simulation fidelity is labor intensive, we introduce a suite of automated metrics to assess game fidelity, technical validity, adherence to task specifications, and winnability, showing a high-degree of agreement with expert human ratings. We pose this as a challenge task to spur further development at the juncture of world modeling and code generation.

show abstract