Is Word Segmentation Child’s Play in All Languages?

Loukatou, Georgia-Rengina; Moran, Steven; Blasí, Damián E.; Stoll, Sabine; Cristià, Alejandrina

doi:10.18653/v1/p19-1383

Cited by 40 publications

(8 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Anonymized, 2020). Incidentally, these two strategies seem to perform quite well, and may thus be viable strategies for natural language acquisition (Loukatou, Moran, Blasi, Stoll & Cristia, 2019).…”

Section: Overall Discussionmentioning

confidence: 98%

Does morphological complexity affect word segmentation? Evidence from computational modeling

Loukatou

Stoll²,

Blasí³

et al. 2022

Cognition

Self Cite

View full text Add to dashboard Cite

show abstract

Section: Overall Discussionmentioning

confidence: 98%

Does morphological complexity affect word segmentation? Evidence from computational modeling

Loukatou

Stoll²,

Blasí³

et al. 2022

Cognition

Self Cite

View full text Add to dashboard Cite

show abstract

“…Therefore, we adopt the Natural Language Processing/Speech Technology standard and use token recall and token precision (e.g., Ludusan, Versteegh, Jansen, Gravier, Cao, Johnson & Dupoux, 2014). This is also the approach adopted by previous work that attempts to compare the overall segmentability of different registers (childversus adult-directed speech, Cristia et al, 2019;Ludusan, Mazuka, Bernard, Cristia & Dupoux, 2017), and different languages (Caines, Altmann-Richer & Buttery, 2019;Loukatou, Stoll, Blasi & Cristia, 2018;Loukatou et al, 2019), or simply evaluate proposed algorithms (e.g., Daland & Pierrehumbert, 2011;Goldwater et al, 2009;Phillips & Pearl, 2014). These scores are calculated by comparing the output string, which contains hypothesized word breaks an algorithm supplies, against the original sentence containing word breaks.…”

Section: Discussionmentioning

confidence: 99%

“…Third, the most pressing avenue for research in modeling word segmentation involves studying more diverse languages (in the wake of Loukatou et al , 2019). Current evidence suggests sizable differences across languages, and this although only a tiny fraction of the world's languages have been investigated.…”

Section: Discussionmentioning

confidence: 99%

“…First, the comparison would be unfair because previous work that has more thoroughly explored these algorithms in English (e.g., Bernard, Thiolliere, Saksida, Loukatou, Larsen, Johnson, Fibla, Dupoux, Daland, Cao & Cristia, 2020) has found that their performance varies enormously as a function of different parameters that are used. Second, it is highly likely that the ranking in their performance varies as a function of language (and perhaps corpora) characteristics (Loukatou, Moran, Blasi, Stoll & Cristia, 2019). Therefore, any discussion of the algorithms’ performance here is limited to providing a backdrop over which to interpret effects associated to bilingual exposure.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Is there a bilingual disadvantage for word segmentation? A computational modeling approach

2021

Self Cite

View full text Add to dashboard Cite

Since there are no systematic pauses delimiting words in speech, the problem of word segmentation is formidable even for monolingual infants. We use computational modeling to assess whether word segmentation is substantially harder in a bilingual than a monolingual setting. Seven algorithms representing different cognitive approaches to segmentation are applied to transcriptions of naturalistic input to young children, carefully processed to generate perfectly matched monolingual and bilingual corpora. We vary the overlap in phonology and lexicon experienced by modeling exposure to languages that are more similar (Catalan and Spanish) or more different (English and Spanish). We find that the greatest variation in performance is due to different segmentation algorithms and the second greatest to language, with bilingualism having effects that are smaller than both algorithm and language effects. Implications of these computational results for experimental and modeling approaches to language acquisition are discussed.

show abstract

“…Additional work using adaptor grammars has suggested that they are fairly successful at segmenting child-directed speech in various languages, including German, Spanish, Italian, Farsi, Hungarian, and Japanese (Phillips & Pearl, 2014), while still showing cross-linguistic differences as you would expect across languages where the syllable structure has different levels of complexity (Fourtassi, Börschinger, Johnson & Dupoux, 2013;Johnson, 2008;Loukatou, Stoll, Blasi & Cristia, 2018).…”

Section: Lexically-driven Modelsmentioning

confidence: 99%

Is there a bilingual disadvantage for word segmentation? A computational modeling approach

Fibla,

Sebastian-Galles,

Cristia

2021

Preprint

View full text Add to dashboard Cite

Since there are no systematic pauses delimiting words in speech, the problem of word segmentation is formidable even for monolingual infants. We use computational modeling to assess whether word segmentation is substantially harder in a bilingual than a monolingual setting. Seven algorithms representing different cognitive approaches to segmentation are applied to transcriptions of naturalistic input to young children, carefully processed to generate perfectly matched monolingual and bilingual corpora. We vary the overlap in phonology and lexicon experienced by modeling exposure to languages that are similar (Catalan and Spanish) or more different (English and Spanish). We find that the greatest variation in performance is due to different segmentation algorithms and the second greatest to language, with bilingualism having effects that are smaller than both algorithm and language effects. Implications of these computational results for experimental and modeling approaches to language acquisition are discussed.

show abstract

Is Word Segmentation Child’s Play in All Languages?

Cited by 40 publications

References 25 publications

Does morphological complexity affect word segmentation? Evidence from computational modeling

Does morphological complexity affect word segmentation? Evidence from computational modeling

Is there a bilingual disadvantage for word segmentation? A computational modeling approach

Is there a bilingual disadvantage for word segmentation? A computational modeling approach

Contact Info

Product

Resources

About