2017
DOI: 10.1162/coli_a_00293
|View full text |Cite
|
Sign up to set email alerts
|

A Kernel Independence Test for Geographical Language Variation

Abstract: Quantifying the degree of spatial dependence for linguistic variables is a key task for analyzing dialectal variation. However, existing approaches have important drawbacks. First, they are based on parametric models of dependence, which limits their power in cases where the underlying parametric assumptions are violated. Second, they are not applicable to all types of linguistic data: some approaches apply only to frequencies, others to boolean indicators of whether a linguistic variable is present. We presen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
18
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(18 citation statements)
references
References 44 publications
0
18
0
Order By: Relevance
“…The traditional approach to dialectology is to find the geographical distribution of known lexical alternatives (e.g. you, yall and yinz: (Labov et al, 2005;Nerbonne et al, 2008;Gonçalves and Sánchez, 2014;Doyle, 2014;Huang et al, 2015;Nguyen and Eisenstein, 2016)), the shortcoming of which is that the alternative lexical variables must be known beforehand. There have also been attempts to automatically identify such words from geotagged documents (Eisenstein et al, 2010;Ahmed et al, 2013;Eisenstein, 2015).…”
Section: Related Workmentioning
confidence: 99%
“…The traditional approach to dialectology is to find the geographical distribution of known lexical alternatives (e.g. you, yall and yinz: (Labov et al, 2005;Nerbonne et al, 2008;Gonçalves and Sánchez, 2014;Doyle, 2014;Huang et al, 2015;Nguyen and Eisenstein, 2016)), the shortcoming of which is that the alternative lexical variables must be known beforehand. There have also been attempts to automatically identify such words from geotagged documents (Eisenstein et al, 2010;Ahmed et al, 2013;Eisenstein, 2015).…”
Section: Related Workmentioning
confidence: 99%
“…For that, the Hilbert-Schimidt independence criterion (HSIC) has been used following the recent findings of Nguyen & Eisenstein (2017) [13], who showed that HSIC outperformed other known spatial autocorrelation tests (i.e. Moran's I, Join Count Analysis, and the Mantel test) in the task of detecting geographical language variation.…”
Section: Methodsmentioning
confidence: 99%
“…In contrast, filtering the entire vocabulary throughout the entire corpus provide a larger set of significant unbiased features. Moreover, as Nguyen & Eisenstein showed [13], the HSIC test is a better alternative for Moran's I when using linguistics variables.…”
Section: Related Workmentioning
confidence: 96%
See 1 more Smart Citation
“…Principal Component Analysis, PCA); and 5) visualizing the found groups as regions in a geographical map. This methodology has several known issues that have been noted the in the recent literature [12,4,13].…”
Section: A Critical View Of Current Paradigms In Dialectometrymentioning
confidence: 99%