2008
DOI: 10.1524/stuf.2008.0026
|View full text |Cite
|
Sign up to set email alerts
|

Automated classification of the world′s languages: a description of the method and preliminary results

Abstract: An approach to the classification of languages through automated lexical comparison is described. This method produces near-expert classifications. At the core of the approach is the Automated Similarity Judgment Program (ASJP). ASJP is applied to 100-item lists of core vocabulary from 245 globally distributed languages. The output is 29,890 lexical similarity percentages for the same number of paired languages. Percentages are used as a database in a program originally designed for generating phylogenetic tre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
81
0
3

Year Published

2009
2009
2023
2023

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 98 publications
(85 citation statements)
references
References 22 publications
1
81
0
3
Order By: Relevance
“…In the transcription of wordlists of individual languages a simplified system is used, as described in [6]. The symbols in this system are listed and briefly described in Tables 1-2.…”
Section: General Properties Of Sounds and Wordsmentioning
confidence: 99%
See 1 more Smart Citation
“…In the transcription of wordlists of individual languages a simplified system is used, as described in [6]. The symbols in this system are listed and briefly described in Tables 1-2.…”
Section: General Properties Of Sounds and Wordsmentioning
confidence: 99%
“…These items were selected not for their susceptibility to sound symbolism but rather for their phonological stability across time: specifically, they were identified in [5] as the concepts for which the words were most similar phonologically in languages known to be related by common descent. The database is therefore unbiased with respect to sound symbolism, because it was developed for the purpose of producing automated language classifications [6,7] and for investigating other issues of a historical linguistic nature, including the identification of linguistic homelands [8] and the calculation of dates for the break-up of proto-languages using a technique similar to glottochronology. Consequently, it provides an opportunity to make a pilot study of sound symbolism in basic vocabulary, which is to our knowledge the first of its kind.…”
Section: Introductionmentioning
confidence: 99%
“…9 The measures of spoken and common native language concern the probability that two people at random for a country pair will share the same spoken language or the same native language, respectively, as the case may be. The measure of linguistic proximity refers instead to similarities of a limited list of words with identical meanings based on expert judgments, where these judgments come from the Automatic Similarity Judgment Program, an international project by ethnolinguists and ethnostatisticians (see Brown et al, 2008). The measure of common official language is the usual binary one.…”
Section: Footnotesmentioning
confidence: 99%
“…From the word-by-word distances, a cardinal measure of phonetic distance is derived by averaging and normalizing to take into account potential similarities in phonetic inventories, that might induce a certain similarity by chance. For computational details see Brown et al (2008).…”
Section: Empirical Strategymentioning
confidence: 99%