Proceedings of the 2019 Conference of the North 2019
DOI: 10.18653/v1/n19-1156
|View full text |Cite
|
Sign up to set email alerts
|

A Probabilistic Generative Model of Linguistic Typology

Abstract: In the principles-and-parameters framework, the structural features of languages depend on parameters that may be toggled on or off, with a single parameter often dictating the status of multiple features. The implied covariance between features inspires our probabilisation of this line of linguistic inquirywe develop a generative model of language based on exponential-family matrix factorisation. By modelling all languages and features within the same architecture, we show how structural similarities between … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
20
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 22 publications
(21 citation statements)
references
References 38 publications
1
20
0
Order By: Relevance
“…Malaviya et al (2017); Murawaki (2017); Bjerva and Augenstein (2018a); Bjerva et al (2019c)), most such work does not take into account that both phylogenetic and geographic proximity should be controlled for. Languages which have shared common ancestry will often have similar typological features, hence training and evaluating on the same language family will tend to inflate the expected performance of the model (Bjerva et al, 2019a). In the data for this shared task, we make sure to control for both of these factors.…”
Section: Evaluation Setupmentioning
confidence: 99%
See 1 more Smart Citation
“…Malaviya et al (2017); Murawaki (2017); Bjerva and Augenstein (2018a); Bjerva et al (2019c)), most such work does not take into account that both phylogenetic and geographic proximity should be controlled for. Languages which have shared common ancestry will often have similar typological features, hence training and evaluating on the same language family will tend to inflate the expected performance of the model (Bjerva et al, 2019a). In the data for this shared task, we make sure to control for both of these factors.…”
Section: Evaluation Setupmentioning
confidence: 99%
“…A survey of approaches to prediction of features is provided in Ponti et al (2019a, § 4.3). Some common approaches include prediction based on language representations learned as a by-product of model training (Östling and Tiedemann, 2017;Malaviya et al, 2017;Bjerva and Augenstein, 2018a;Bjerva et al, 2019c) and matrix factorisation (Murawaki, 2017;Bjerva et al, 2019a).…”
Section: Predicting Typological Featuresmentioning
confidence: 99%
“…In this ap proach, the languages are represented as random variables that are explained in terms of other lan guages related to each other through phylogenetic and spatial neighborhood graphs. Bjerva et al (2019) introduce a generative model inspired by the Chomskyan principlesandparameters frame work, drawing on the correlations between typo logical features of languages to tackle the novel task of typological collaborative filtering, a con cept borrowed from the area of recommender sys tems.…”
Section: Related Workmentioning
confidence: 99%
“…The availability of comparable treebanks -syntactically annotated corpora -for a growing number of typologically distinct languages (most prominently in the collaborative Universal Dependencies project (Nivre et al, 2016)) has led to a recent surge of interest in computational work aiming to detect systematic patterns in the grammatical systems of natural languages and/or to test hypotheses from theoretical work in language typology against empirical evidence. The treebank-based approach (Liu, 2010;Lochbihler, 2017;Gerdes et al, 2019;Bjerva et al, 2019c;Hahn et al, 2020) adds a more data-driven perspective to a strand of research in computational typology (Daumé and Campbell, 2007;Malaviya et al, 2017;Oncevay et al, 2019;Bjerva et al, 2019a;Bjerva et al, 2019b) that is based on carefully curated typological databases such as WALS 1 (Dryer and Haspelmath, 2013) or URIEL 2 .…”
mentioning
confidence: 99%
“…A major focus has been on (a) detecting universals that have the form of an implication between two typological variables, and (b) predicting the value of unknown features in typological databases based on systematic patterns in attested grammatical systems. Graphical models have been widely used to calculate the strength of an implication (Daumé and Campbell, 2007;Lu, 2013;Bjerva et al, 2019b;Bjerva et al, 2019a). While this approach is suitable if one wants to marginalize out the influence of confounding variables, it also constrains the investigated universals to have the form of an implication consisting of one implicand and usually one (but possibly multiple) implicant(s).…”
mentioning
confidence: 99%