2020
DOI: 10.31234/osf.io/xwgpk
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions

Abstract:

This study looks at the predictablity level of words in child-directed speech, and shows that: (1) languages show similar predictability levels; (2) these levels are beneficial for word segmentation in both children and adults. The study discusses the role of learnability as a driving force towards Zipfian distributions of words.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 25 publications
1
5
0
Order By: Relevance
“…This would justify the pattern observed in the second part of the task for the high-TP “words.” Alternatively, it can also be argued that, unlike adults, children might have extracted the statistical regularities embedded in the input by using a simpler strategy, i.e., computing syllable frequency (i.e., the number of times a given syllable appeared in the speech stream) instead of the probability of one syllable to be followed by another syllable in the stream (i.e., TPs). This interpretation is supported by recent findings, suggesting that children learn better in unbalanced than balanced distributions (i.e., in Zipf distributions), as it occurs in natural languages ( Lavi-Rotbain and Arnon, 2019 , 2020 , 2021 ). Due to cognitive limitations, the children’s immature brain might simply rely on the use of a more “economic” strategy, which may even have facilitated the learning of lower frequency elements later on ( Bortfeld et al, 2005 ; Palmer et al, 2019 ; Lavi-Rotbain and Arnon, 2021 ; Soares et al, 2021a ).…”
Section: Discussionsupporting
confidence: 77%
“…This would justify the pattern observed in the second part of the task for the high-TP “words.” Alternatively, it can also be argued that, unlike adults, children might have extracted the statistical regularities embedded in the input by using a simpler strategy, i.e., computing syllable frequency (i.e., the number of times a given syllable appeared in the speech stream) instead of the probability of one syllable to be followed by another syllable in the stream (i.e., TPs). This interpretation is supported by recent findings, suggesting that children learn better in unbalanced than balanced distributions (i.e., in Zipf distributions), as it occurs in natural languages ( Lavi-Rotbain and Arnon, 2019 , 2020 , 2021 ). Due to cognitive limitations, the children’s immature brain might simply rely on the use of a more “economic” strategy, which may even have facilitated the learning of lower frequency elements later on ( Bortfeld et al, 2005 ; Palmer et al, 2019 ; Lavi-Rotbain and Arnon, 2021 ; Soares et al, 2021a ).…”
Section: Discussionsupporting
confidence: 77%
“…Even though extracting statistics from large-scale corpus resources representing authentic language use is computationally intense, the inclusion of statistics obtained from such corpora in constructing the stimulus material to be used in language processing tasks is of importance for one simple reason: It more adequately represents the complexities of the real-world distributional statistics inherent in natural languages-thereby better reflecting the actual types of input humans are exposed to-and going beyond the simple uniform distributions traditionally used in artificial language learning paradigms (for recent exceptions, see, Lavi-Rotbain & Arnon, 2020;Li et al, 2020;Snell & Theeuwes, 2020). Our approach also has the potential to link statistical learning research more tightly with that of a related field of research on language adaptation (for an overview, see, Chang et al, 2012).…”
Section: Discussionmentioning
confidence: 99%
“…This nding suggests that, unlike adults, children seem to guide the extraction of the statistical regularities embedded in the input by computing the frequency with which each syllable occurred in the stream rather than syllables' TPs. Indeed, the fact that high-TP 'words' entailed a higher number of syllables, that occurred three-times less often in the stream than low-TP 'words', might have make the immature cognitive system to use a more 'economic' strategy (syllable frequency instead of syllables' TPs) to predict the upcoming events, and to use that knowledge to facilitate the learning of lower frequency elements later on [see 36, 51,52 ]. The fact that during the rst half of the aSL task presented under explicit conditions children showed larger N400 amplitudes for low-TP 'words', and during the second half of the same task larger N400 amplitudes for high-TP 'words' seem to be in accordance with this rationale, although future research should further explore this issue by manipulating the use of different statistics (conditional vs. distributive) in the input.…”
Section: Discussionmentioning
confidence: 99%