Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial) 2017
DOI: 10.18653/v1/w17-1203
|View full text |Cite
|
Sign up to set email alerts
|

Computational analysis of Gondi dialects

Abstract: This paper presents a computational analysis of Gondi dialects spoken in central India. We present a digitized data set of the dialect area, and analyze the data using different techniques from dialectometry, deep learning, and computational biology. We show that the methods largely agree with each other and with the earlier non-computational analyses of the language group.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 15 publications
0
2
0
Order By: Relevance
“…In addition, existing lexical comparative datasets for Dravidian are limited. Grierson [ 32 ] features 19 Dravidian languages; Beine [ 33 ] is a survey of Gondi dialects in 46 locations (see [ 34 ] for a computational analysis). The Dravidian Etymological Dictionary , revised 2nd edition [ 22 ] features information on 29 languages, but is heavily skewed towards written languages, and especially towards the four largest Dravidian languages Kannada, Malayalam, Tamil and Telugu.…”
Section: Methodsmentioning
confidence: 99%
“…In addition, existing lexical comparative datasets for Dravidian are limited. Grierson [ 32 ] features 19 Dravidian languages; Beine [ 33 ] is a survey of Gondi dialects in 46 locations (see [ 34 ] for a computational analysis). The Dravidian Etymological Dictionary , revised 2nd edition [ 22 ] features information on 29 languages, but is heavily skewed towards written languages, and especially towards the four largest Dravidian languages Kannada, Malayalam, Tamil and Telugu.…”
Section: Methodsmentioning
confidence: 99%
“…Rama and Çöltekin (2016) and Rama et al (2017) develop an LSTM-based method for representing the phonological structure of individual word forms across closely related speech varieties. Each string is fed to a unidirectional or bidirectional LSTM autoencoder, which learns a continuous latent multidimensional representation of the sequence.…”
Section: Lstm Autoencodermentioning
confidence: 99%