2024
DOI: 10.1126/science.adi1374
|View full text |Cite
|
Sign up to set email alerts
|

Grounded language acquisition through the eyes and ears of a single child

Wai Keen Vong,
Wentao Wang,
A. Emin Orhan
et al.

Abstract: Starting around 6 to 9 months of age, children begin acquiring their first words, linking spoken words to their visual counterparts. How much of this knowledge is learnable from sensory input with relatively generic learning mechanisms, and how much requires stronger inductive biases? Using longitudinal head-mounted camera recordings from one child aged 6 to 25 months, we trained a relatively generic neural network on 61 hours of correlated visual-linguistic data streams, learning feature-based representations… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2025
2025

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 27 publications
(1 citation statement)
references
References 62 publications
0
1
0
Order By: Relevance
“…The disparity between the simplistic, passive learning environment we provided and the rich, multi-modal, and interactive experiences that shape infant learning is pronounced. Efforts to bridge this gap have included capturing infants' sensory experiences through head-mounted cameras (Vong, Wang, Orhan, & Lake, 2024;Emin Orhan, Wang, Wang, Ren, & Lake, 2024;Orhan, Gupta, & Lake, 2020;Sullivan, Mei, Perfors, Wojcik, & Frank, 2021), eye-tracking (Sheybani, Hansaria, Smith, & Tiganj, n.d.;Mendez, Yu, & Smith, n.d.;Candy et al, 2023), and simulating interaction with the environment via embodied agents (Wykowska, Chaminade, & Cheng, 2016). Our benchmark is poised to serve as a critical testing ground for models trained on these datasets.…”
Section: Discussionmentioning
confidence: 99%
“…The disparity between the simplistic, passive learning environment we provided and the rich, multi-modal, and interactive experiences that shape infant learning is pronounced. Efforts to bridge this gap have included capturing infants' sensory experiences through head-mounted cameras (Vong, Wang, Orhan, & Lake, 2024;Emin Orhan, Wang, Wang, Ren, & Lake, 2024;Orhan, Gupta, & Lake, 2020;Sullivan, Mei, Perfors, Wojcik, & Frank, 2021), eye-tracking (Sheybani, Hansaria, Smith, & Tiganj, n.d.;Mendez, Yu, & Smith, n.d.;Candy et al, 2023), and simulating interaction with the environment via embodied agents (Wykowska, Chaminade, & Cheng, 2016). Our benchmark is poised to serve as a critical testing ground for models trained on these datasets.…”
Section: Discussionmentioning
confidence: 99%