Proceedings of the 10th Workshop on Building and Using Comparable Corpora 2017
DOI: 10.18653/v1/w17-2505
|View full text |Cite
|
Sign up to set email alerts
|

Toward a Comparable Corpus of Latvian, Russian and English Tweets

Abstract: Twitter has become a rich source for linguistic data. Here, a possibility of building a trilingual Latvian-Russian-English corpus of tweets from Riga, Latvia is investigated. Such a corpus, once constructed, might be of great use for multiple purposes including training machine translation models, examining cross-lingual phenomena and studying the population of Riga. This pilot study shows that it is feasible to build such a resource by collecting and analysing a pilot corpus, which is made publicly available … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 2 publications
0
2
0
Order By: Relevance
“…Even though the usage and popularity of Twitter have stopped rapidly growing and even dropped in recent years 2 , it still has a considerable amount of loyal users who keep on sharing everything from worldwide events to random personal details with their followers. We decided to focus on one of the random personal details that people share, specifically, anything to do with food consumption and related topics.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Even though the usage and popularity of Twitter have stopped rapidly growing and even dropped in recent years 2 , it still has a considerable amount of loyal users who keep on sharing everything from worldwide events to random personal details with their followers. We decided to focus on one of the random personal details that people share, specifically, anything to do with food consumption and related topics.…”
Section: Introductionmentioning
confidence: 99%
“…Several corpora of Latvian tweets exist in prior work, but none of them are domainspecific and have been collected over an extensive period of time. Milajevs [1] collected and analysed 1.4 million tweets geo-located in Riga, Latvia from April 2017 to July 2018 and 60 thousand tweets [2] from November 2016 to March 2017. Pinnis [3] collected and analysed 3.8 million tweets of Latvian politicians, companies, media, and users who interacted from August 2016 to July 2018 There are also several data sets of general sentiment-annotated tweets [4], [5], [3] 3 amounting to 14,781 tweets in total.…”
Section: Introductionmentioning
confidence: 99%