2021
DOI: 10.12688/openreseurope.13843.1
|View full text |Cite
|
Sign up to set email alerts
|

Automated identification of borrowings in multilingual wordlists

Abstract: Although lexical borrowing is an important aspect of language evolution, there have been few attempts to automate the identification of borrowings in lexical datasets. Moreover, none of the solutions which have been proposed so far identify borrowings across multiple languages. This study proposes a new method for the task and tests it on a newly compiled large comparative dataset of 48 South-East Asian languages. The method yields very promising results, while it is conceptually straightforward and easy to ap… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
1
1

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 34 publications
0
5
0
Order By: Relevance
“…Given that computational methods for the detection of cognates are still not able to compete with experts (List et al 2017), our collection thus offers rich material to test and train new methods in the future. In a similar way -given that the collection uni es data on a global basis -scholars can use the data collection to test new methods for the automated identi cation of borrowings (Zhang et al 2019;List and Forkel 2021b), or to expand upon previous approaches to the automated detection of contact areas (Gast and Koptjevskaja-Tamm 2018;Matsumae et al 2021;Ranacher et al 2021). In addition, we illustrate how the data can be used to automatically extract various phonological and lexical features for individual language varieties.…”
Section: Background and Summarymentioning
confidence: 99%
See 1 more Smart Citation
“…Given that computational methods for the detection of cognates are still not able to compete with experts (List et al 2017), our collection thus offers rich material to test and train new methods in the future. In a similar way -given that the collection uni es data on a global basis -scholars can use the data collection to test new methods for the automated identi cation of borrowings (Zhang et al 2019;List and Forkel 2021b), or to expand upon previous approaches to the automated detection of contact areas (Gast and Koptjevskaja-Tamm 2018;Matsumae et al 2021;Ranacher et al 2021). In addition, we illustrate how the data can be used to automatically extract various phonological and lexical features for individual language varieties.…”
Section: Background and Summarymentioning
confidence: 99%
“…Lexibank and lexical data in CLDF formats have been promoted in several ways so far. First, we have conducted detailed studies in which CLDF formats are used along with CLDFBench and the pylexibank software package, illustrating how data aggregation can be successfully carried out (List et al 2018;Rzymski et al 2020), or showing how data can be supplemented in transparent CLDF formats (Wu et al 2020;List and Forkel 2021b). Second, we have created certain agship projects which showcase speci c aspects of CLDF and the advantage of using integrated data (Geisler et al 2020;Ferraz Gerardi et al 2021).…”
Section: Promotion Of Lexibankmentioning
confidence: 99%
“…There are discrete features on sound inventory sizes (1-7, number of vowels, consonants, etc. ), there are various features on special sound types or individual specific sounds (8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19), there are three prosodic features (20)(21)(22), and eight features pertaining to specific sound-meaning relations (also termed "sound symbolism", [23][24][25][26][27][28][29][30].…”
Section: Technical Validationmentioning
confidence: 99%
“…Lexibank and lexical data in CLDF formats have been promoted in several ways. First, we have conducted detailed studies in which CLDF formats are used along with CLDFBench and the PyLexibank software package, illustrating how data aggregation can be successfully carried out 60,61 , or showing how data can be supplemented in transparent CLDF formats 21,68 Second, we have created certain flagship projects which showcase specific aspects of CLDF and the advantage of using integrated data 99,100 . Third, we have conducted projects with students and young scholars, who were trained to use our new resources and encouraged to share their knowledge in the form of small blog posts (published at https://calc.hypotheses.org) along with new datasets which bachelor, doctoral, and master students lifted themselves assisted by our team 70,[101][102][103] .…”
Section: Usage Notesmentioning
confidence: 99%
See 1 more Smart Citation