The negative impact of false information on social networks is rapidly growing. Current research on the topic focused on the detection of fake news in a particular context or event (such as elections) or using data from a short period of time. Therefore, an evaluation of the current proposals in a long-term scenario where the topics discussed may change is lacking. In this work, we deviate from current approaches to the problem and instead focus on a longitudinal evaluation using social network publications spanning an 18-month period. We evaluate different combinations of features and supervised models in a long-term scenario where the training and testing data are ordered chronologically, and thus the robustness and stability of the models can be evaluated through time. We experimented with 3 different scenarios where the models are trained with 15-, 30-, and 60-day data periods. The results show that detection models trained with word-embedding features are the ones that perform better and are less likely to be affected by the change of topics (for example, the rise of COVID-19 conspiracy theories). Furthermore, the additional days of training data also increase the performance of the best feature/model combinations, although not very significantly (around 2%). The results presented in this paper build the foundations towards a more pragmatic approach to the evaluation of fake news detection models in social networks.
The proliferation of false information on social networks is one of the hardest challenges in today's society, with implications capable of changing users perception on what is a fact or rumor. Due to its complexity, there has been an overwhelming number of contributions from the research community like the analysis of specific events where rumors are spread, analysis of the propagation of false content on the network, or machine learning algorithms to distinguish what is a fact and what is "fake news". In this paper, we identify and summarize some of the most prevalent works on the different categories studied. Finally, we also discuss the methods applied to deceive users and what are the next main challenges of this area.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.