Proceedings of the 24th Conference on Computational Natural Language Learning 2020
DOI: 10.18653/v1/2020.conll-1.50
|View full text |Cite
|
Sign up to set email alerts
|

Disentangling dialects: a neural approach to Indo-Aryan historical phonology and subgrouping

Abstract: This paper seeks to uncover patterns of sound change across Indo-Aryan languages using an LSTM encoder-decoder architecture. We augment our models with embeddings representing language ID, part of speech, and other features such as word embeddings. We find that a highly augmented model shows highest accuracy in predicting held-out forms, and investigate other properties of interest learned by our models' representations. We outline extensions to this architecture that can better capture variation in Indo-Aryan… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 29 publications
0
5
0
Order By: Relevance
“…As a demonstration of the usability of the dataset for computational historical linguistics, we replicate the reflex prediction task of Cathcart and Rama (2020). We train neural models on the task of reflex prediction in Indo-Aryan languages, i.e.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…As a demonstration of the usability of the dataset for computational historical linguistics, we replicate the reflex prediction task of Cathcart and Rama (2020). We train neural models on the task of reflex prediction in Indo-Aryan languages, i.e.…”
Section: Methodsmentioning
confidence: 99%
“…Other South Asian cognate databases. Cathcart (2019aCathcart ( ,b, 2020 and Cathcart and Rama (2020) also previously made use of data from Turner (1962)(1963)(1964)(1965)(1966) by scraping the version hosted online by Digitial Dictionaries of South Asia.…”
Section: Introductionmentioning
confidence: 99%
“…Some of these sources have been used in previous work on South Asian historical linguistics, e.g. Cathcart and Rama (2020); Cathcart (2019bCathcart ( ,a, 2020-this is the first attempt to consolidate them. Note some previous work in this direction: while the SARVA project (Southworth, 2005) did not reach fruition, a searchable database of Dravidian cognates was developed by Suresh Kolichala under its auspices.…”
Section: Jambu Etymological Databasementioning
confidence: 99%
“…Sanskrit /kùa:r@/ > Hindi / > tS h a:r/ 'ashes' as well as /k h a:r/ 'alkali' (Masica, 1993). The variability of these sound changes has recently been used to statistically model dialect components in IA languages (Cathcart, 2019a(Cathcart, ,b, 2020Cathcart and Rama, 2020).…”
Section: Indo-aryan Sound Changesmentioning
confidence: 99%