Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.119
|View full text |Cite
|
Sign up to set email alerts
|

Harnessing Cross-lingual Features to Improve Cognate Detection for Low-resource Languages

Abstract: Cognates are variants of the same lexical form across different languages; for example "fonema" in Spanish and "phoneme" in English are cognates, both of which mean "a unit of sound". The task of automatic detection of cognates among any two languages can help downstream NLP tasks such as Cross-lingual Information Retrieval, Computational Phylogenetics, and Machine Translation. In this paper, we demonstrate the use of cross-lingual word embeddings for detecting cognates among fourteen Indian Languages. Our app… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
1
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(12 citation statements)
references
References 42 publications
0
1
0
Order By: Relevance
“…The extensive experiments (in Section 6) on three different cognate detection datasets across language families have showcased the efficacy of our weakly-supervised and supervised cognate detection framework. For example, on six different Indian language pairs, our weakly-supervised model (with morphological knowledge) has outperformed the state-of-the-art supervised model proposed by Kanojia et al (2020a) by an average of 9 points of F -score whereas, for Celtic language pairs, it outperformed by 8.6 points of F -score. At the same time, our supervised framework has produced a state-of-the-art performance by outperforming the existing supervised model by an average of 16 points of F -score.…”
Section: Introductionmentioning
confidence: 85%
See 3 more Smart Citations
“…The extensive experiments (in Section 6) on three different cognate detection datasets across language families have showcased the efficacy of our weakly-supervised and supervised cognate detection framework. For example, on six different Indian language pairs, our weakly-supervised model (with morphological knowledge) has outperformed the state-of-the-art supervised model proposed by Kanojia et al (2020a) by an average of 9 points of F -score whereas, for Celtic language pairs, it outperformed by 8.6 points of F -score. At the same time, our supervised framework has produced a state-of-the-art performance by outperforming the existing supervised model by an average of 16 points of F -score.…”
Section: Introductionmentioning
confidence: 85%
“…Kanojia et al (2019a) performed a cognate detection task on Indian languages, which includes a large amount of manual intervention during identification. Kanojia et al (2019b) introduced a character sequence-based recurrent neural network for identifying cognates between Indian language pairs.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…In historical linguistics, cognate identification methods are mainly based on three types of similarity measures: semantic, phonetic, and orthographic. For information on semantic similarity, special-purpose multilingual dictionaries, such as the wellknown Swadesh List, are used, and more recently cross-lingual word embeddings (Kanojia et al, 2020(Kanojia et al, , 2021. For orthographic similarity, string metrics (Hauer & Kondrak, 2011;St Arnaud et al, 2017) are often employed, e.g.…”
Section: State Of the Artmentioning
confidence: 99%