Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1383
|View full text |Cite
|
Sign up to set email alerts
|

Is Word Segmentation Child’s Play in All Languages?

Abstract: When learning language, infants need to break down the flow of input speech into minimal word-like units, a process best described as unsupervised bottom-up segmentation. Proposed strategies include several segmentation algorithms, but only cross-linguistically robust algorithms could be plausible candidates for human word learning, since infants have no initial knowledge of the ambient language. We report on the stability in performance of 11 conceptually diverse algorithms on a selection of 8 typologically d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 40 publications
(8 citation statements)
references
References 25 publications
0
8
0
Order By: Relevance
“…Anonymized, 2020). Incidentally, these two strategies seem to perform quite well, and may thus be viable strategies for natural language acquisition (Loukatou, Moran, Blasi, Stoll & Cristia, 2019).…”
Section: Overall Discussionmentioning
confidence: 98%
“…Anonymized, 2020). Incidentally, these two strategies seem to perform quite well, and may thus be viable strategies for natural language acquisition (Loukatou, Moran, Blasi, Stoll & Cristia, 2019).…”
Section: Overall Discussionmentioning
confidence: 98%
“…Therefore, we adopt the Natural Language Processing/Speech Technology standard and use token recall and token precision (e.g., Ludusan, Versteegh, Jansen, Gravier, Cao, Johnson & Dupoux, 2014). This is also the approach adopted by previous work that attempts to compare the overall segmentability of different registers (childversus adult-directed speech, Cristia et al, 2019;Ludusan, Mazuka, Bernard, Cristia & Dupoux, 2017), and different languages (Caines, Altmann-Richer & Buttery, 2019;Loukatou, Stoll, Blasi & Cristia, 2018;Loukatou et al, 2019), or simply evaluate proposed algorithms (e.g., Daland & Pierrehumbert, 2011;Goldwater et al, 2009;Phillips & Pearl, 2014). These scores are calculated by comparing the output string, which contains hypothesized word breaks an algorithm supplies, against the original sentence containing word breaks.…”
Section: Discussionmentioning
confidence: 99%
“…Third, the most pressing avenue for research in modeling word segmentation involves studying more diverse languages (in the wake of Loukatou et al , 2019). Current evidence suggests sizable differences across languages, and this although only a tiny fraction of the world's languages have been investigated.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Additional work using adaptor grammars has suggested that they are fairly successful at segmenting child-directed speech in various languages, including German, Spanish, Italian, Farsi, Hungarian, and Japanese (Phillips & Pearl, 2014), while still showing cross-linguistic differences as you would expect across languages where the syllable structure has different levels of complexity (Fourtassi, Börschinger, Johnson & Dupoux, 2013;Johnson, 2008;Loukatou, Stoll, Blasi & Cristia, 2018).…”
Section: Lexically-driven Modelsmentioning
confidence: 99%