“…While there is a challenge in transferring results from simulated to real environments, simulated environments are more accessible, less expensive, and allow for the testing of technologies that may not be sufficiently safe for use in the real world (Savva et al, 2019). Additionally, while simulated environments can be used for tasks that do not require the use of language (Anderson et al, 2018a;Batra et al, 2020;Gan et al, 2020;Kant et al, 2022), they play a particularly valuable role in developing language understanding and reasoning capabilities over actions that are currently difficult for physical robots to complete, but we hope it will become a reality in the future (Kolve et al, 2017). Much of the work in language understanding for embodied AI happens using vision and language navigation, where an agent must learn to navigate through a previously unseen environment purely based on natural language route instructions (Anderson et al, 2018b;.…”