Proceedings of the Fourth Workshop on Fact Extraction and VERification (FEVER) 2021
DOI: 10.18653/v1/2021.fever-1.9
|View full text |Cite
|
Sign up to set email alerts
|

FANG-COVID: A New Large-Scale Benchmark Dataset for Fake News Detection in German

Abstract: As the world continues to fight the COVID-19 pandemic, it is simultaneously fighting an 'infodemic' -a flood of disinformation and spread of conspiracy theories leading to health threats and the division of society. To combat this infodemic, there is an urgent need for benchmark datasets that can help researchers develop and evaluate models geared towards automatic detection of disinformation. While there are increasing efforts to create adequate, open-source benchmark datasets for English, comparable resource… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 44 publications
0
6
0
Order By: Relevance
“…As pre-trained models for the English datasets, BERT (Devlin et al 2019) and RoBERTa (Liu et al 2019) have been used by multiple studies. Also some researchers used domain specific transformers (Kotonya and Toni 2020) such as SciBERT (Beltagy et al 2019) or applied a domain adaptation onto transformer embeddings (Dharawat et al 2020) and (Hossain et al 2020). Mattern et al (2021 augmented BERT representation with the features representing users and post interactions.…”
Section: Used Retrievalmentioning
confidence: 99%
See 1 more Smart Citation
“…As pre-trained models for the English datasets, BERT (Devlin et al 2019) and RoBERTa (Liu et al 2019) have been used by multiple studies. Also some researchers used domain specific transformers (Kotonya and Toni 2020) such as SciBERT (Beltagy et al 2019) or applied a domain adaptation onto transformer embeddings (Dharawat et al 2020) and (Hossain et al 2020). Mattern et al (2021 augmented BERT representation with the features representing users and post interactions.…”
Section: Used Retrievalmentioning
confidence: 99%
“…Internet is a popular and accessible source of health information (Percheski and Hargittai 2011;Marton and Choo 2012) that has even become a first choice for some individuals seeking information about their health conditions or medical advice before a consultation with a physician (Gualtieri 2009). Patients could be more engaged with their treatment decisions (Stevenson et al 2007) and feel more confident as they acquire more information from the internet (Oh and Lee 2012).…”
Section: Introductionmentioning
confidence: 99%
“…The ongoing COVID-19 pandemic has sparked bioNLP research to leverage or contextualize information about the disease and virus from social media. A number of studies explore detecting COVID-19related misinformation and fact-checking (Hossain et al, 2020;Chen and Hasan, 2021;Mattern et al, 2021;Saakyan et al, 2021, i.a.). Others have looked into monitoring information surrounding the virus using social media (Cornelius et al, 2020;Hu et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…We will get on-line information from different sources like social media web sites, search engines, homepage of news organization web sites or the reality-checking websites. On the internet, there are a few publicly to be had datasets for fake information category like Buzzfeed information, LIAR [15], BS Detector and so forth. These datasets had been broadly utilized in specific studies papers for figuring out the veracity of information in the following sections, I've mentioned in quick about the resources of the dataset used in these paintings.…”
Section: Data Collection and Analysismentioning
confidence: 99%
“…Records accrued need to be pre-processed-that is, cleaned, converted and incorporated earlier than it is able to go through training procedure [16]. The dataset that we used is defined below: LIAR: This dataset is accrued from truth-checking website PolitiFact via its API [15]. It includes 12,836 human labelled short statements, which might be sampled from diverse contexts, along with news releases, tv or radio interviews, marketing campaign speeches, and so on.…”
Section: Data Collection and Analysismentioning
confidence: 99%