2018
DOI: 10.1038/sdata.2018.189
|View full text |Cite
|
Sign up to set email alerts
|

Global-scale phylogenetic linguistic inference from lexical resources

Abstract: Automatic phylogenetic inference plays an increasingly important role in computational historical linguistics. Most pertinent work is currently based on expert cognate judgments. This limits the scope of this approach to a small number of well-studied language families. We used machine learning techniques to compile data suitable for phylogenetic inference from the ASJP database, a collection of almost 7,000 phonetically transcribed word lists over 40 concepts, covering two thirds of the extant world-wide ling… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
83
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
2
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 60 publications
(83 citation statements)
references
References 50 publications
0
83
0
Order By: Relevance
“…There were a total of 29 family trees for each. Details on data availability are provided online 40 .…”
Section: Methodsmentioning
confidence: 99%
“…There were a total of 29 family trees for each. Details on data availability are provided online 40 .…”
Section: Methodsmentioning
confidence: 99%
“…As in most previous Bayesian phylogenetic studies we employ topological constraints, but unlike our predecessors we introduce as many constraints as possible since we are not concerned with language classification but only with the dating of recognized subgroups. A recent study [46] using the maximum likelihood phylogenetic software RAxML [56] showed that trees inferred from word lists in the ASJP database largely agree with the Glottolog [19] classification in terms of quartet distances [57]. We therefore find it justifiable to not take the extra step of inferring all nodes in the phylogenies but rather to fix most of the nodes of the tree based on the Glottolog classification.…”
Section: Plos Onementioning
confidence: 87%
“…Using both cognate and sound class data. Bayesian phylogenetic studies typically use cognate classes to infer phylogenies, but it has been shown [46] that using both cognates and sound classes give better results for phylogenetic inference when drawing upon (an earlier version of) the ASJP database. We employ the automated cognate identification system described above [42] to assign cognate judgments to word lists.…”
Section: Methodsmentioning
confidence: 99%
“…We choose WALS [32], Glottolog [24] and Etnologue [33] as expert based reference classifications and Gerhard Jäger [34]'s world tree as the -to our knowledge -best computer inferred global classification available so far. Reference classifications are evolving overtime.…”
Section: Choosing and Handling Reference Classification To Benchmark mentioning
confidence: 99%