The rise of social media has led to an increasing online cyber-war via hate and violent comments or speeches, and even slick videos that lead to the promotion of extremism and radicalization. An analysis to sense cyber-extreme content from microblogging sites, specifically Twitter, is a challenging, and an evolving research area since it poses several challenges owing short, noisy, context-dependent, and dynamic nature content. The related tweets were crawled using query words and then carefully labelled into two classes: Extreme (having two sub-classes: pro-Afghanistan government and pro-Taliban) and Neutral. An Exploratory Data Analysis (EDA) using Principal Component Analysis (PCA), was performed for tweets data (having Term Frequency—Inverse Document Frequency (TF-IDF) features) to reduce a high-dimensional data space into a low-dimensional (usually 2-D or 3-D) space. PCA-based visualization has shown better cluster separation between two classes (extreme and neutral), whereas cluster separation, within sub-classes of extreme class, was not clear. The paper also discusses the pros and cons of applying PCA as an EDA in the context of textual data that is usually represented by a high-dimensional feature set. Furthermore, the classification algorithms like naïve Bayes’, K Nearest Neighbors (KNN), random forest, Support Vector Machine (SVM) and ensemble classification methods (with bagging and boosting), etc., were applied with PCA-based reduced features and with a complete set of features (TF-IDF features extracted from n-gram terms in the tweets). The analysis has shown that an SVM demonstrated an average accuracy of 84% compared with other classification models. It is pertinent to mention that this is the novel reported research work in the context of Afghanistan war zone for Twitter content analysis using machine learning methods.
This novel method enlivened via cartographic maps in the geology area and this technique has been utilized to reintroduce in the visualization space lost data in which the non-linear mapping brings about. The diagnostic measurement of such a bending has been communicated as Magnification Factors, and after that computed then envisioned together as the Cartogram maps We improved interpretability Linear model apply through drtoolbox where cyan circles represent HLA-A, red plus sign represents HLA-B and blue square represents HLA-C. Basic purpose behind this study was that previously for large amount of data set, clustering and classifications techniques were used, but through drtoolbox, it is used in MATLAB. The researcher has visualized data for better understanding. This data was aligned in class I HLA-A, HLA-B and HLA-C. Data was available in the form of groups, when it was aligned horizontally then there were 372 rows and 12458 columns. After sorting of data 180 columns remained, Then this data was checked column wise check. The dashes present in the data was replaced by the alphabet displayed at the top of each column. The data coding was done on 12458 rows and data was converted into nominal form. Consensus sequence of data was checked later, the purpose of this sequence is to check the occurrence of each alphabet in a column. The alphabet that was maximum was converted to binary code 1 and remaining were converted to 0. When the data was converted in to binary then models were applied on the data. If the data is in linear form then linear model is better and if the data is in non linear form then non linear model is better, it depends on the results of the data. But in case of this study non linear models showed worst visualization. PCA which is a linear model has showed much better visualization.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.