2018
DOI: 10.29007/9wpx
|View full text |Cite
|
Sign up to set email alerts
|

Dialectones: Finding statistically significant dialectal boundaries using Twitter data

Abstract: Abstract. Most NLP applications assume that a particular language is homogeneous in the regions where it is spoken. However, each language varies considerably throughout its geographical distribution. To make NLP sensitive to dialects, a reliable, representative and up-to-date source of information that quantitatively represents such geographical variation is necessary. However, some of the current approaches have disadvantages such as the need for parameters, ignoring the geographical coordinates in the analy… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0
3

Year Published

2018
2018
2022
2022

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 7 publications
0
4
0
3
Order By: Relevance
“…The study was made using a classification language model with good performance over multiple languages. In [8] the authors study Spanish language variations in Colombia. The analysis used uni-gram features, and the authors stated that it was challenging to compare Spanish variations against regions identified by other authors using classical dialectometry.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The study was made using a classification language model with good performance over multiple languages. In [8] the authors study Spanish language variations in Colombia. The analysis used uni-gram features, and the authors stated that it was challenging to compare Spanish variations against regions identified by other authors using classical dialectometry.…”
Section: Related Workmentioning
confidence: 99%
“…Both language and geographical information are crucial to know and understand the geographies of this online data, also the way some information related to economic, social, political, environmental trends could be used [7]. In contrast, it is not easy to accurately analyze variations of language using only classical dialectometry [8]; therefore, we aim for approaches more related to machine learning and natural language processing solutions to be able to handle very large datasets in our study.…”
Section: Introductionmentioning
confidence: 99%
“…Realizado a partir de un corpus diferente, está el trabajo de Rodríguez-Díaz, Jiménez, Dueñas, Bonilla y Gelbukh (2018). El equipo investigador recolectó a través de la red social Twitter un corpus de 28 millones de trinos registrados en 237 localidades del territorio colombiano, y los procesó estadísticamente de acuerdo con diferentes medidas y pruebas.…”
Section: Sobre El Superdialecto Costeño Y Su Extensiónunclassified
“…Fuente: elaboración propia con base en los datosde Ávila et al (2015),Bonilla (2019),IGAC (2002),Mora et al (2004) y Rodríguez-Díaz et al (2018 …”
unclassified
See 1 more Smart Citation