Purpose
Owing to the increased accessibility of internet and related technologies, more and more individuals across the globe now turn to social media for their daily dose of news rather than traditional news outlets. With the global nature of social media and hardly any checks in place on posting of content, exponential increase in spread of fake news is easy. Businesses propagate fake news to improve their economic standing and influencing consumers and demand, and individuals spread fake news for personal gains like popularity and life goals. The content of fake news is diverse in terms of topics, styles and media platforms, and fake news attempts to distort truth with diverse linguistic styles while simultaneously mocking true news. All these factors together make fake news detection an arduous task. This work tried to check the spread of disinformation on Twitter.
Design/methodology/approach
This study carries out fake news detection using user characteristics and tweet textual content as features. For categorizing user characteristics, this study uses the XGBoost algorithm. To classify the tweet text, this study uses various natural language processing techniques to pre-process the tweets and then apply a hybrid convolutional neural network–recurrent neural network (CNN-RNN) and state-of-the-art Bidirectional Encoder Representations from Transformers (BERT) transformer.
Findings
This study uses a combination of machine learning and deep learning approaches for fake news detection, namely, XGBoost, hybrid CNN-RNN and BERT. The models have also been evaluated and compared with various baseline models to show that this approach effectively tackles this problem.
Originality/value
This study proposes a novel framework that exploits news content and social contexts to learn useful representations for predicting fake news. This model is based on a transformer architecture, which facilitates representation learning from fake news data and helps detect fake news easily. This study also carries out an investigative study on the relative importance of content and social context features for the task of detecting false news and whether absence of one of these categories of features hampers the effectiveness of the resultant system. This investigation can go a long way in aiding further research on the subject and for fake news detection in the presence of extremely noisy or unusable data.