Acquiring language from speech by learning to remember and predict

Shain, Cory; Elsner, Micha

doi:10.18653/v1/2020.conll-1.15

Cited by 7 publications

(8 citation statements)

References 103 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Future work could explore the extent to which syntactic knowledge can be acquired from lower-level (e.g. phonemic or acoustic) input alone by including a word segmentation task (Elsner and Shain, 2017;Shain and Elsner, 2020) for the model. Additionally, recent work in unsupervised grammar induction (Jin and Schuler, 2020;Zhang et al, 2021) has shown that incorporating visual information in the form of images and videos helps learn constituents that denote entities or action.…”

Section: Discussionmentioning

confidence: 99%

Character-based PCFG Induction for Modeling the Syntactic Acquisition of Morphologically Rich Languages

Jin

Oh²,

Schuler

2021

Findings of the Association for Computational Linguistics: EMNLP 2021

View full text Add to dashboard Cite

Unsupervised PCFG induction models, which build syntactic structures from raw text, can be used to evaluate the extent to which syntactic knowledge can be acquired from distributional information alone. However, many state-of-the-art PCFG induction models are word-based, meaning that they cannot directly inspect functional affixes, which may provide crucial information for syntactic acquisition in child learners. This work first introduces a neural PCFG induction model that allows a clean ablation of the influence of subword information in grammar induction. Experiments on child-directed speech demonstrate first that the incorporation of subword information results in more accurate grammars with categories that word-based induction models have difficulty finding, and second that this effect is amplified in morphologically richer languages that rely on functional affixes to express grammatical relations. A subsequent evaluation on multilingual treebanks shows that the model with subword information achieves state-ofthe-art results on many languages, further supporting a distributional model of syntactic acquisition.

show abstract

Section: Discussionmentioning

confidence: 99%

Character-based PCFG Induction for Modeling the Syntactic Acquisition of Morphologically Rich Languages

Jin

Oh²,

Schuler

2021

Findings of the Association for Computational Linguistics: EMNLP 2021

View full text Add to dashboard Cite

show abstract

“…In terms of input, some models operate on linguistic abstractions of speech, such as phonemic, phonetic or orthographic transcripts (e.g., Frank et al, 2010;Goldwater et al, 2009;Nikolaus and Fourtassi, 2021), phonetic or lexical representations derived using pre-trained automatic speech recognition systems (e.g., Fourtassi and Dupoux, 2014;Roy, 2005;Salvi et al, 2012), or by using some simplified representations of acoustic speech, such as formant frequencies of pre-segmented vowels (Coen, 2006;de Boer and Kuhl, 2003). Another set of models operate directly on real continuous speech (e.g., Kamper et al, 2016;Nixon, 2020;Park and Glass, 2008;Schatz et al, 2021;Shain and Elsner, 2020). Besides processing language input only, there are models that use visual concurrent input in addition to spoken language (e.g., Alishahi et al, 2017;Chrupa la et al, 2017;Coen, 2006;Harwath et al, 2019;Harwath et al, 2016;Khorrami and Räsänen, 2021;Nikolaus and Fourtassi, 2021;Roy, 2005).…”

Section: Previous Workmentioning

confidence: 99%

“…Instead of analyzing clustering purity, the ABX-test analyzes phonemic discriminability of internal representations learned by a model. In studies focusing on speech segmentation, such as phone (Michel et al, 2016;Räsänen, 2014;Scharenborg et al, 2007), syllable (Räsänen et al, 2018), or word segmentation (Shain and Elsner, 2020), the model is typically a system that has a mechanism to identify temporal positions of unit boundaries in time. These boundaries are then compared to unit boundaries in ground-truth phonetic or word-level annotations.…”

Section: Reference Point? Pros Consmentioning

confidence: 99%

Introducing meta-analysis in the evaluation of computational models of infant language development

Blandón¹,

Cristià²,

Räsänen³

2021

Preprint

View full text Add to dashboard Cite

Computational models of child language development can help us understand the cognitive underpinnings of the language learning process. One advantage of computational modeling is that is has the potential to address multiple aspects of language learning within a single learning architecture. If successful, such integrated models would help to pave the way for a more comprehensive and mechanistic understanding of language development. However, in order to develop more accurate, holistic, and hence impactful models of infant language learning, the research on models also requires model evaluation practices that allow comparison of model behavior to empirical data from infants across a range of language capabilities. Moreover, there is a need for practices that can compare developmental trajectories of infants to those of models as a function of language experience. The present study aims to take the first steps to address these needs. More specifically, we will introduce the concept of comparing models with large-scale cumulative empirical data from infants, as quantified by meta-analyses conducted across a large number of individual behavioral studies. We start by formalizing the connection between measurable model and human behavior, and then present a basic conceptual framework for meta-analytic evaluation of computational models together with basic guidelines intended as a starting point for later work in this direction. We exemplify the meta-analytic model evaluation approach with two modeling experiments on infant-directed speech preference and native/non-native vowel discrimination. We also discuss the advantages, challenges, and potential future directions of meta-analytic evaluation practices.

show abstract

“…Methods for automatically learning phone-or word-like units from unlabelled speech audio could enable speech technology in severely low-resourced settings [1,2] and could lead to new cognitive models of human language acquisition [3][4][5]. The goal in unsupervised representation learning of phone units is to learn features which capture phonetic contrasts while being invariant to properties like the speaker or channel.…”

Section: Introductionmentioning

confidence: 99%

“…We evaluate these on four different tasks: unsupervised phone segmentation [23], ABX phone discrimination [24], same-different word discrimination [25], and as inputs to a symbolic word segmentation algorithm [1]. The last-mentioned is particularly important since the segmentation and clustering of word-like units remains a major but important challenge [5,26].…”

Section: Introductionmentioning

confidence: 99%

Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks

Kamper¹,

Niekerk²

2020

Preprint

View full text Add to dashboard Cite

We investigate segmenting and clustering speech into low-bitrate phone-like sequences without supervision. We specifically constrain pretrained self-supervised vector-quantized (VQ) neural networks so that blocks of contiguous feature vectors are assigned to the same code, thereby giving a variable-rate segmentation of the speech into discrete units. Two segmentation methods are considered. In the first, features are greedily merged until a prespecified number of segments are reached. The second uses dynamic programming to optimize a squared error with a penalty term to encourage fewer but longer segments. We show that these VQ segmentation methods can be used without alteration across a wide range of tasks: unsupervised phone segmentation, ABX phone discrimination, same-different word discrimination, and as inputs to a symbolic word segmentation algorithm. The penalized method generally performs best. While results are only comparable to the state-of-the-art in some cases, in all tasks a reasonable competing approach is outperformed at a substantially lower bitrate.

show abstract

Acquiring language from speech by learning to remember and predict

Cited by 7 publications

References 103 publications

Character-based PCFG Induction for Modeling the Syntactic Acquisition of Morphologically Rich Languages

Character-based PCFG Induction for Modeling the Syntactic Acquisition of Morphologically Rich Languages

Introducing meta-analysis in the evaluation of computational models of infant language development

Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks

Contact Info

Product

Resources

About