Dialectones: Finding statistically significant dialectal boundaries using Twitter data

Rodriguez-Diaz, Carlos A.; Jiménez, Sergio; Dueñas, George; Bonilla, Johnatan Estiven; Gelbukh, Alexander

doi:10.29007/9wpx

Cited by 4 publications

(7 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The study was made using a classification language model with good performance over multiple languages. In [8] the authors study Spanish language variations in Colombia. The analysis used uni-gram features, and the authors stated that it was challenging to compare Spanish variations against regions identified by other authors using classical dialectometry.…”

Section: Related Workmentioning

confidence: 99%

“…Both language and geographical information are crucial to know and understand the geographies of this online data, also the way some information related to economic, social, political, environmental trends could be used [7]. In contrast, it is not easy to accurately analyze variations of language using only classical dialectometry [8]; therefore, we aim for approaches more related to machine learning and natural language processing solutions to be able to handle very large datasets in our study.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Regionalized models for Spanish language variations based on Twitter

Téllez¹,

Moctezuma²,

Miranda³

et al. 2021

Preprint

View full text Add to dashboard Cite

Dialectometry is a discipline devoted to studying the variations of a language around a geographical region. One of their goals is the creation of linguistic atlases capturing the similarities and differences of the language under study around the area in question. For instance, Spanish is one of the most spoken languages across the world, but not necessarily Spanish is written and spoken in the same way in different countries. This manuscript presents a broad analysis describing lexical and semantic relationships among 26 Spanish-speaking countries around the globe. For this study, we analyze four-year of the Twitter geotagged public stream to provide an extensive survey of the Spanish language vocabularies of different countries, its distributions, semantic usage of terms, and emojis.We also offer open regional word-embedding resources for Spanish Twitter to help other researchers and practitioners take advantage of regionalized models.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Regionalized models for Spanish language variations based on Twitter

Téllez¹,

Moctezuma²,

Miranda³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Realizado a partir de un corpus diferente, está el trabajo de Rodríguez-Díaz, Jiménez, Dueñas, Bonilla y Gelbukh (2018). El equipo investigador recolectó a través de la red social Twitter un corpus de 28 millones de trinos registrados en 237 localidades del territorio colombiano, y los procesó estadísticamente de acuerdo con diferentes medidas y pruebas.…”

Section: Sobre El Superdialecto Costeño Y Su Extensiónunclassified

“…Fuente: elaboración propia con base en los datosde Ávila et al (2015),Bonilla (2019),IGAC (2002),Mora et al (2004) y Rodríguez-Díaz et al (2018 …”

unclassified

See 1 more Smart Citation

El español de Colombia. Nueva propuesta de división dialectal

Vásquez

2020

Lenguaje

View full text Add to dashboard Cite

El presente estudio ofrece una nueva división dialectal del español de Colombia, que ajusta las propuestas previas mediante la incorporación de datos lingüísticos y dialectológicos recientes, la elaboración de un nuevo modelo espacial de la estructuración dialectal basado en el concepto de escala, y la consideración de todo el territorio colombiano continental para la propuesta. El estudio identifica y caracteriza cinco nuevas variedades diatópicas previamente no consideradas en la discusión sobre el español colombiano, integra de modo coherente el español de Colombia en el marco del español americano, y señala algunas de las necesidades de la investigación dialectológica actual en el país.

show abstract