Toward a Comparable Corpus of Latvian, Russian and English Tweets

Milajevs, Dmitrijs

doi:10.18653/v1/w17-2505

Cited by 1 publication

(2 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Even though the usage and popularity of Twitter have stopped rapidly growing and even dropped in recent years 2 , it still has a considerable amount of loyal users who keep on sharing everything from worldwide events to random personal details with their followers. We decided to focus on one of the random personal details that people share, specifically, anything to do with food consumption and related topics.…”

Section: Introductionmentioning

confidence: 99%

“…Several corpora of Latvian tweets exist in prior work, but none of them are domainspecific and have been collected over an extensive period of time. Milajevs [1] collected and analysed 1.4 million tweets geo-located in Riga, Latvia from April 2017 to July 2018 and 60 thousand tweets [2] from November 2016 to March 2017. Pinnis [3] collected and analysed 3.8 million tweets of Latvian politicians, companies, media, and users who interacted from August 2016 to July 2018 There are also several data sets of general sentiment-annotated tweets [4], [5], [3] 3 amounting to 14,781 tweets in total.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

What Can We Learn from Almost a Decade of Food Tweets

Sprogis¹,

Rikters²

2020

Frontiers in Artificial Intelligence and Applications

View full text Add to dashboard Cite

We present the Latvian Twitter Eater Corpus - a set of tweets in the narrow domain related to food, drinks, eating and drinking. The corpus has been collected over time-span of over 8 years and includes over 2 million tweets entailed with additional useful data. We also separate two sub-corpora of question and answer tweets and sentiment annotated tweets. We analyse the contents of the corpus and demonstrate use-cases for the sub-corpora by training domain-specific question-answering and sentiment-analysis models using the data from the corpus.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%