Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1523
|View full text |Cite
|
Sign up to set email alerts
|

A Computational Model of Early Language Acquisition from Audiovisual Experiences of Young Infants

Abstract: Earlier research has suggested that human infants might use statistical dependencies between speech and non-linguistic multimodal input to bootstrap their language learning before they know how to segment words from running speech. However, feasibility of this hypothesis in terms of real-world infant experiences has remained unclear. This paper presents a step towards a more realistic test of the multimodal bootstrapping hypothesis by describing a neural network model that can learn word segments and their mea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
8
0

Year Published

2020
2020
2025
2025

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(9 citation statements)
references
References 36 publications
(51 reference statements)
1
8
0
Order By: Relevance
“…In addition, when implemented as a neural network with several hidden layers, these hidden layers start to reflect selectivity towards different types of linguistic units that the input speech consist of. This is in line with earlier findings using neural network models using supervised training (Nagamine et al, 2015; see also Magnuson et al, 2020) or simplified visual input (Räsänen and Khorrami, 2019). Here we show that similar emergence of units can be observed in learning conditions analogous to cross-situational learning.…”
Section: Discussionsupporting
confidence: 93%
See 3 more Smart Citations
“…In addition, when implemented as a neural network with several hidden layers, these hidden layers start to reflect selectivity towards different types of linguistic units that the input speech consist of. This is in line with earlier findings using neural network models using supervised training (Nagamine et al, 2015; see also Magnuson et al, 2020) or simplified visual input (Räsänen and Khorrami, 2019). Here we show that similar emergence of units can be observed in learning conditions analogous to cross-situational learning.…”
Section: Discussionsupporting
confidence: 93%
“…Recently, Räsänen and Khorrami (2019) trained a weakly supervised convolutional neural network (CNN) VGS model to map acoustic speech to the labels of concurrently visible objects attended by the baby hearing the speech, as extracted from head-mounted video data from real infant-caregiver interactions of English-learning infants (Bergelson & Aslin, 2017). They then measured the so-called phoneme selectivity index (PSI) (Mesgarani et al, 2014) of the network nodes and layers.…”
Section: Earlier Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…In computational studies, researchers have built models that implement in-principle learning algorithms, and created training sets to test the abilities of the models to find statistical regularities in the input data. Some work in modeling word learning has used sensory data collected from adult learners or robots (Roy & Pentland, 2002;Yu & Ballard, 2007;Rasanen & Khorrami, 2019), while many models take symbolic data or simplified inputs (Frank et al, 2009;Kachergis & Yu, 2017;K. Smith, Smith, & Blythe, 2011;Fazly, Alishahi, & Stevenson, 2010;Yu & Ballard, 2007).…”
Section: Introductionmentioning
confidence: 99%