Purpose
This paper aims to examine the trends in research studies in the past decade which address the use and analysis of propaganda in social media using natural language processing. The purpose of this study is to conduct a comprehensive bibliometric review of studies focusing on the use, identification and analysis of propaganda in social media.
Design/methodology/approach
This work investigates and examines the research papers acquired from the Scopus database which has huge number of peer reviewed literature and also provides interfaces to access required for bibliometric study. This paper has covered subject papers from 2010 to early 2020 and using tools such as VOSviewer and Biblioshiny.
Findings
This bibliometric survey shows that propaganda in social media is more studied in the area of social sciences, and the field of computer science is catching up. The evolution of research for propaganda in social media shows positive trends. This subject is primarily rooted in the social sciences. Also this subject has shown a recent shift in the area of computer science. The keyword analysis shows that the propaganda in social media is being studied in conjunction with issues such as fake news, political astroturfing, terrorism and radicalization.
Research limitations/implications
The lack of highly cited papers and co-citation analysis implies intermittent contributions by the researchers. Propaganda in social media is becoming a global phenomenon, and ill effects of this are evident in developing countries as well. This denotes a great deal of scope of work for researchers in other countries focusing on their territorial issues. This study was conducted in the confines of data captured from the Scopus database. Hence, it should be noted that some vital publications in recent times could not be included in this study.
Originality/value
The uniqueness of this work is that a thorough bibliometric analysis of the topic is demonstrated using several forms such as mind map, co-occurrence, co-citations, Sankey plot and topic dendrograms by using bibliometric tools such as VOSviewer and Biblioshiny.
In this digital era, people rely on the internet for their news consumption. As people are free to express their opinions on social media, much information shared on the internet is loaded with propaganda. Propagandist contents are intended to influence public opinion. In the mainstream media or prominent news agencies, the authors’ and news agencies’ own bias may impact in the news contents. Hence, it is required to detect such propaganda spread through news articles. Detection and classification of propagandist text require standard, high-quality, annotated datasets. A few datasets are available for propaganda classification. However, these datasets are mostly in English. Hindi is the most spoken language in India, and efforts are needed to detect its propagandist contents. This research work introduces two new datasets: H-Prop and H-Prop-News, which consist of news articles in Hindi annotated as propaganda or non-propaganda. The H-Prop dataset is generated by translating 28,630 news articles from the QProp dataset. The H-Prop-News dataset contains 5500 news articles collected from 32 prominent Hindi news websites. We experiment with the proposed datasets using four supervised machine learning models combined with different feature vectors and word embeddings. Our experiments achieve 87% accuracy using Logistic Regression with TF-IDF feature vectors. The datasets provide high-quality labeled news articles in Hindi and open new avenues for researchers to explore techniques for analyzing and classifying propaganda in Hindi text.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.