Society and individuals are negatively influenced both politically and socially by the widespread increase of fake news either way generated by humans or machines. In the era of social networks, the quick rotation of news makes it challenging to evaluate its reliability promptly. Therefore, automated fake news detection tools have become a crucial requirement. To address the aforementioned issue, a hybrid Neural Network architecture, that combines the capabilities of CNN and LSTM, is used with two different dimensionality reduction approaches, Principle Component Analysis (PCA) and Chi-Square. This work proposed to employ the dimensionality reduction techniques to reduce the dimensionality of the feature vectors before passing them to the classifier. To develop the reasoning, this work acquired a dataset from the Fake News Challenges (FNC) website which has four types of stances: agree, disagree, discuss, and unrelated. The nonlinear features are fed to PCA and chi-square which provides more contextual features for fake news detection. The motivation of this research is to determine the relative stance of a news article towards its headline. The proposed model improves results by ∼ 4% and ∼ 20% in terms of Accuracy and F 1 − score. The experimental results show that PCA outperforms than Chi-square and state-of-the-art methods with 97.8% accuracy.
Quora is a growing platform comprising a user generated collection of questions and answers. The questions and answers are created, edited, and organized by the users. Enormous number of users on the Quora website makes it unavoidable to have multiple questions from different users with similar intent, which raises the issue of duplicate questions. Effectively detecting duplicate questions would make it easier to find high quality answers and help save time, which in turn would result in an improved user experience for writers and readers on Quora. In this paper, Quora Question Pairs dataset is collected from Kaggle for detection of duplicate questions. First, three types of word embeddings involving Google news vector embedding, FastText crawl embedding with 300 dimensions, and FastText crawl sub words embedding with 300 dimensions are implemented individually to vectorize all the questions and train the model. The final features used for prediction are blend of these three types of word embeddings. Then, Siamese MaLSTM (''Ma'' for Manhattan distance) Neural Network model is applied for prediction of duplicate questions in the dataset. Finally, the model is tested on 100000 pairs of questions. The experiments show that the proposed model achieves 91.14% accuracy which is better than the state-of-the-art models. INDEX TERMS Duplicate question pair detection, text mining, deep learning, MaLSTM, word embedding.
Efficient word representation techniques (word embeddings) with modern machine learning models have shown reasonable improvement on automatic text classification tasks. However, the effectiveness of such techniques has not been evaluated yet in terms of insufficient word vector representation for training. Convolutional Neural Network has achieved significant results in pattern recognition, image analysis, and text classification. This study investigates the application of the CNN model on text classification problems by experimentation and analysis. We trained our classification model with a prominent word embedding generation model, Fast Text on publically available datasets, six benchmark datasets including Ag News, Amazon Full and Polarity, Yahoo Question Answer, Yelp Full, and Polarity. Furthermore, the proposed model has been tested on the Twitter US airlines non-benchmark dataset as well. The analysis indicates that using Fast Text as word embedding is a very promising approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.