2021
DOI: 10.48550/arxiv.2110.03664
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels

Abstract: The widespread usage of social networks during mass convergence events, such as health emergencies and disease outbreaks, provides instant access to citizen-generated data that carry rich information about public opinions, sentiments, urgent needs, and situational reports. Such information can help authorities understand the emergent situation and react accordingly. Moreover, social media plays a vital role in tackling misinformation and disinformation. This work presents TBCOV, a large-scale Twitter dataset c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 31 publications
0
3
0
Order By: Relevance
“…In addition to the aforementioned resources, several datasets have been released to study the COVID-19 pandemic on Twitter, providing oftentimes useful metadata (geolocation, sentiment, gender, etc) in addition to raw tweet ids (Banda et al 2021;Chen et al 2020;Lopez and Gallemore 2021;Imran, Qazi, and Ofli 2021).…”
Section: Related Datasetsmentioning
confidence: 99%
“…In addition to the aforementioned resources, several datasets have been released to study the COVID-19 pandemic on Twitter, providing oftentimes useful metadata (geolocation, sentiment, gender, etc) in addition to raw tweet ids (Banda et al 2021;Chen et al 2020;Lopez and Gallemore 2021;Imran, Qazi, and Ofli 2021).…”
Section: Related Datasetsmentioning
confidence: 99%
“…To study the discourse around COVID-19 in the Arab region, we use the TBCOV dataset (Imran et al 2021), an extension of the GeoCoV-19 dataset we sampled and annotated to train our models. TBCOV consists of two billion multilingual and geolocated tweets about COVID-19 spanning from February 1, 2020, until March 31, 2021.…”
Section: Large-scale Analysismentioning
confidence: 99%
“…The three labeled datasets described above were then used to train multiple deep learning models, including CNN, BiL-STM, and BERT to automatically classify a tweet related to COVID-19 into one of the 11 topics we have identified. The best classifier BERT was then applied on a second geotagged dataset of tweets that also contains tweets related to COVID-19 spanning the period from February 1, 2020, to March 31, 2021 (Imran et al 2021).…”
Section: Introductionmentioning
confidence: 99%