The expeditious flow of information over the web and its ease of convenience has increased the fear of the rampant spread of misinformation. This poses a health threat and an unprecedented issue to the world impacting people's life. To cater to this problem, there is a need to detect misinformation. Recent techniques in this area focus on static models based on feature extraction and classification. However, data may change at different time intervals and the veracity of data needs to be checked as it gets updated. There is a lack of models in the literature that can handle incremental data, check the veracity of data and detect misinformation. To fill this gap, authors have proposed a novel Veracity Scanning Model (VSM) to detect misinformation in the healthcare domain by iteratively factchecking the contents evolving over the period of time. In this approach, the healthcare web URLs are classified as legitimate or non-legitimate using sentiment analysis as a feature, document similarity measures to perform fact-checking of URLs, and incremental learning to handle the arrival of incremental data. The experimental results show that the Jaccard Distance measure has outperformed other techniques with an accuracy of 79.2% with Random Forest classifier while the Cosine similarity measure showed less accuracy of 60.4% with the Support Vector Machine classifier. Also, when implemented as an algorithm Euclidean distance showed an accuracy of 97.14% and 98.33% respectively for train and test data.
In the digital world information dissemination takes place through the 'word-of-media'. The fraudulent, and deceitful content such as misinformation has detrimental effects on people. To assess the credibility of content and detect misinformation, fact-based automated fact-checking is considered effective. Automated fact-checking comprises of information retrieval, Natural Language Processing (NLP), and machine learning techniques. Previous studies focused on linguistic and textual features and similarity measures-based approaches. However, these studies lack knowledge of facts, and similarity measures are less accurate when dealing with sparse or zero data. To fill these gaps, we propose a ‘Content Similarity Measure (CSM)’ algorithm that can perform automated fact-checking of URLs in the healthcare domain. A novel set of content similarity score, domain-specific, and sentiment polarity score features are introduced to perform a journalistic way of fact-checking. An extensive analysis of the proposed algorithm in comparison with standard similarity measures and machine learning classifiers showed that the CSS feature outperformed other features with an accuracy of 88.26%. In the algorithmic approach, CSM showed improved accuracy of 91.06% compared to the Jaccard similarity measure with 74.26% accuracy. Another observation is that the algorithmic approach outperformed the feature-based approach. To check the robustness of the algorithms authors have tested the model on three state-of-the-art datasets viz. CoAID, FakeHealth and ReCOVery. With algorithmic approach CSM showed highest accuracy of 87.30%, 89.30%, 85.26%, and 88.83% on CoAID, ReCOVery, FakeHealth (Story) and FakeHealth (Release) datasets respectively. With a feature-based approach, proposed CSM showed the highest accuracy of 85.93%, 87.97%, 83.92%, and 86.80% respectively. Further, the proposed feature-based approach with CSS showed an improvement in accuracy of 0.6% compared to the existing study.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.