Grounded language acquisition through the eyes and ears of a single child

Vong, Wai Keen; Wang, Wentao; Orhan, A. Emin; Lake, Brenden M.

doi:10.1126/science.adi1374

Cited by 27 publications

(1 citation statement)

References 62 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The disparity between the simplistic, passive learning environment we provided and the rich, multi-modal, and interactive experiences that shape infant learning is pronounced. Efforts to bridge this gap have included capturing infants' sensory experiences through head-mounted cameras (Vong, Wang, Orhan, & Lake, 2024;Emin Orhan, Wang, Wang, Ren, & Lake, 2024;Orhan, Gupta, & Lake, 2020;Sullivan, Mei, Perfors, Wojcik, & Frank, 2021), eye-tracking (Sheybani, Hansaria, Smith, & Tiganj, n.d.;Mendez, Yu, & Smith, n.d.;Candy et al, 2023), and simulating interaction with the environment via embodied agents (Wykowska, Chaminade, & Cheng, 2016). Our benchmark is poised to serve as a critical testing ground for models trained on these datasets.…”

Section: Discussionmentioning

confidence: 99%

An Infant-Cognition Inspired Machine Benchmark for Identifying Agency, Affiliation, Belief, and Intention

Li,

Yasuda,

Dillon

et al. 2023

Preprint

View full text Add to dashboard Cite

Preverbal infants have remarkable abilities to understand others' intentions, beliefs, and social affiliations. These skills lay the groundwork for complex cognitive and language development, crucial for navigating human social dynamics throughout life. In contrast, recent developments in Artificial Intelligence systems (AI) are often designed to emulate human-like behaviors by extracting common sense knowledge from language or sensory data. Despite their impressive performance in many aspects, they fail to recover some foundational theory of mind capacities found in early infancy. This discrepancy highlights the critical gap in creating AI that understands people and thinks like people. To address this, we introduce a suite of theory of mind tasks to directly compare the capacities of infants and AI to understand others. Expanding on our prior work, Baby Intuition Benchmark (BIB), which challenges machines to understand goal-directed behaviors, this work expands to a broader scope of social cognition, probing machine understandings of social affiliations, goal attribution, and false belief. We evaluate BIB and the new tasks using Transformer models trained with a self-supervised learning paradigm. The model shows improved performance over existing baselines, elevating the lower bound of social reasoning capacities in learning-driven machines. However, it still exposes the limitations of learning complex causal relations and complex human mental states through behavioral data alone, underscoring the challenges in achieving a human-like theory of mind reasoning in AI.

show abstract