“…Other work has explored this problem of scalability in a variety of ways, from early multimodal approaches (Roy & Pentland, 2002), to more recent work using large-scale naturalistic headcam data (Orhan, Gupta, & Lake, 2020;Tsutsui, Chandrasekaran, Reza, Crandall, & Yu, 2020) and studying the ways in which children or machines an active role in word learning (Gelderloos, Kamelabad, & Alishahi, 2020;Zettersten & Saffran, 2019). The fact multimodal neural networks can be trained from scratch, as demonstrated in Experiment 7 and other works (Harwath et al, 2018;Radford et al, 2021), suggests that these kinds of networks could be further developed to provide a unifying account of artificial word learning in the lab and naturalistic word learning in the wild (Meylan & Bergelson, 2021). Finally, while we attempted to test a broad range of phenomena, our list was by no means exhaustive.…”