Abstract-Webpages with terrorist and extremist content are key factors in the recruitment and radicalization of disaffected young adults who may then engage in terrorist activities at home or fight alongside terrorist groups abroad. This paper reports on advances in techniques for classifying data collected by the Terrorism and Extremism Network Extractor (TENE) webcrawler, a custom-written program that browses the World Wide Web, collecting vast amounts of data, retrieving the pages it visits, analyzing them, and recursively following the links out of those pages. The textual content is subjected to enhanced classification through software analysis, using the Posit textual analysis toolset, generating a detailed frequency analysis of the syntax, including multi-word units and associated part-of-speech components. Results are then deployed in a knowledge extraction process using knowledge extraction algorithms, e.g., from the WEKA system. Indications are that the use of the data enrichment through application of Posit analysis affords a greater degree of match between automatic and manual classification than previously attained. Furthermore, the incorporation and deployment of these technologies promises to provide public safety officials with techniques that can help to detect terrorist webpages, gauge the degree of extremism of their content, discriminate between webpages that do or do not require a concerted response, and take appropriate action where warranted.
Disinformation attacks that make use of social media platforms, e.g., the attacks orchestrated by the Russian “Internet Research Agency” during the 2016 U.S. Presidential election campaign and the 2016 Brexit referendum in the UK, have led to increasing demands from governmental agencies for AI tools that are capable of identifying such attacks in their earliest stages, rather than responding to them in retrospect. This research undertaken on behalf of the Canadian Armed Forces and Department of National Defence. Our ultimate objective is the development of an integrated set of machine-learning algorithms which will mobilize artificial intelligence to identify hostile disinformation activities in “near-real-time.” Employing The Dark Crawler, the Posit Toolkit, TensorFlow (Deep Neural Networks), plus the Random Forest classifier and short-text classification programs known as LibShortText and LibLinear, we have analysed a wide sample of social media posts that exemplify the “fake news” that was disseminated by Russia’s Internet Research Agency, comparing them to “real news” posts in order to develop an automated means of classification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.