Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1302
|View full text |Cite
|
Sign up to set email alerts
|

CogNet: A Large-Scale Cognate Database

Abstract: This paper introduces CogNet, a new, large-scale lexical database that provides cognates-words of common origin and meaning-across languages. The database currently contains 3.1 million cognate pairs across 338 languages using 35 writing systems. The paper also describes the automated method by which cognates were computed from publicly available wordnets, with an accuracy evaluated to 94%. Finally, statistics and early insights about the cognate data are presented, hinting at a possible future exploitation of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1
1

Relationship

3
2

Authors

Journals

citations
Cited by 15 publications
(16 citation statements)
references
References 15 publications
0
16
0
Order By: Relevance
“…The resulting humanannotated dataset contained 8353 words, 62,752 pairs of cognate words and 587,357 pairs of non-cognate words. This set was significantly larger (by 80%) than the one we used in (Batsuren et al 2019a). We divided this dataset into two equal parts: the first 30 concepts for hyperparameter tuning (''tuning'') and the second 30 concepts for evaluation (''test'').…”
Section: Discussionmentioning
confidence: 97%
See 3 more Smart Citations
“…The resulting humanannotated dataset contained 8353 words, 62,752 pairs of cognate words and 587,357 pairs of non-cognate words. This set was significantly larger (by 80%) than the one we used in (Batsuren et al 2019a). We divided this dataset into two equal parts: the first 30 concepts for hyperparameter tuning (''tuning'') and the second 30 concepts for evaluation (''test'').…”
Section: Discussionmentioning
confidence: 97%
“…This section describes how CogNet was evaluated on a diverse set of cognate corpora, and how its parameters were tuned to optimise results. With respect to the evaluation dataset used in Batsuren et al (2019a), we have considerably extended the evaluation corpus size, and we have also incorporated a pre-existing cognate database into our evaluations. The creation of self-annotated evaluation datasets despite the existence of cognate databases was desirable due to the latter being either phonetic (and thus not usable for our purposes) or limited to very few language pairs (as the resource described below).…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…As shown in Figure 1, we exploited a cognate database, CogNet 4 (Batsuren et al, 2019(Batsuren et al, , 2021, that has 8.1M cognate pairs, for evidence on cognacy: cog(w A , w B ) = True is asserted by the presence of the word pair in CogNet.…”
Section: Derivation Enrichmentmentioning
confidence: 99%