Reliability Prediction for Health-Related Content: A Replicability Study

Fernández-Pichel, Marcos; Losada, David E.; Pichel, Juan C.; Elsweiler, David

doi:10.1007/978-3-030-72240-1_4

Cited by 8 publications

(7 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Normalized count of commercial terms : As illustrated in the literature [ 13 ], the higher the number of commercial terms, the less credible is perceived the related information, due to the for-profit purpose of such information. At a practical level, a list of 45 commercial terms taken from [ 72 ] (such as “ sale ”, “ deal ”, “ ad ”, etc.) has been compiled.…”

Section: Materials and Methodsmentioning

confidence: 99%

Health Misinformation Detection in the Social Web: An Overview and a Data Science Approach

Sotto

Viviani

2022

IJERPH

View full text Add to dashboard Cite

The increasing availability of online content these days raises several questions about effective access to information. In particular, the possibility for almost everyone to generate content with no traditional intermediary, if on the one hand led to a process of “information democratization”, on the other hand, has negatively affected the genuineness of the information disseminated. This issue is particularly relevant when accessing health information, which impacts both the individual and societal level. Often, laypersons do not have sufficient health literacy when faced with the decision to rely or not rely on this information, and expert users cannot cope with such a large amount of content. For these reasons, there is a need to develop automated solutions that can assist both experts and non-experts in discerning between genuine and non-genuine health information. To make a contribution in this area, in this paper we proceed to the study and analysis of distinct groups of features and machine learning techniques that can be effective to assess misinformation in online health-related content, whether in the form of Web pages or social media content. To this aim, and for evaluation purposes, we consider several publicly available datasets that have only recently been generated for the assessment of health misinformation under different perspectives.

show abstract

Section: Materials and Methodsmentioning

confidence: 99%

Health Misinformation Detection in the Social Web: An Overview and a Data Science Approach

Sotto

Viviani

2022

IJERPH

View full text Add to dashboard Cite

show abstract

“…Two other recent works based on the use of handcrafted features and Machine Learning approaches are those described in [15,25]. In [25], a Logistic Regression model for assessing the reliability of Web pages has been trained on labeled data collected w.r.t.…”

Section: Automated Approachesmentioning

confidence: 99%

“…Textual features are employed in the form of count-based and TF-IDF word vectors. In [15], a replicability study has been conducted on [36], considering two additional datasets made available in [34,39], and ignoring PageRank features, deemed as not suitable for assessing Web content reliability [30].…”

Section: Automated Approachesmentioning

confidence: 99%

“…Credibility ratings associated with them are provided over a five-point Likert scale, ranging from 1 to 5, where 1 stands for "very non-credible", and 5 for "very credible". In [15], for evaluation purposes, labels have been pre-processed by removing the middle value 3, and mapping 4-5 rating values to credible Web pages and 1-2 rating values to non-credible Web pages. In our approach, we followed the same strategy, and we focused on the 130 available health-related Web pages.…”

Section: Description Of the Datasetsmentioning

confidence: 99%

See 1 more Smart Citation

Health Misinformation Detection in Web Content

Upadhyay

Pasi

Viviani

2021

Proceedings of the Conference on Information Technology for Social Good

View full text Add to dashboard Cite

In recent years, we have witnessed the proliferation of large amounts of online content generated directly by users with virtually no form of external control, leading to the possible spread of misinformation. The search for effective solutions to this problem is still ongoing, and covers different areas of application, from opinion spam to fake news detection. A more recently investigated scenario, despite the serious risks that incurring disinformation could entail, is that of the online dissemination of health information. Early approaches in this area focused primarily on userbased studies applied to Web page content. More recently, automated approaches have been developed for both Web pages and social media content, particularly with the advent of the COVID-19 pandemic. These approaches are primarily based on handcrafted features extracted from online content in association with Machine Learning. In this scenario, we focus on Web page content, where there is still room for research to study structural-, content-and context-based features to assess the credibility of Web pages. Therefore, this work aims to study the effectiveness of such features in association with a deep learning model, starting from an embedded representation of Web pages that has been recently proposed in the context of phishing Web page detection, i.e., Web2Vec.

show abstract

“…Technological innovation in the fight against disinformation, as the authors argue, should go beyond discrediting noncredible sources of information and should instead promote more careful information consumption [ 11 ]. The literature has reported on successful machine learning models that classify entire articles or information sources [ 12 , 13 ]. Of note, these models can easily overfit (ie, obtain high classification accuracy for publications from media outlets present in the training set but fail to generalize to previously unseen media outlets).…”

Section: Introductionmentioning

confidence: 99%

Active Annotation in Evaluating the Credibility of Web-Based Medical Information: Guidelines for Creating Training Data Sets for Machine Learning

et al. 2021

View full text Add to dashboard Cite

Background The spread of false medical information on the web is rapidly accelerating. Establishing the credibility of web-based medical information has become a pressing necessity. Machine learning offers a solution that, when properly deployed, can be an effective tool in fighting medical misinformation on the web. Objective The aim of this study is to present a comprehensive framework for designing and curating machine learning training data sets for web-based medical information credibility assessment. We show how to construct the annotation process. Our main objective is to support researchers from the medical and computer science communities. We offer guidelines on the preparation of data sets for machine learning models that can fight medical misinformation. Methods We begin by providing the annotation protocol for medical experts involved in medical sentence credibility evaluation. The protocol is based on a qualitative study of our experimental data. To address the problem of insufficient initial labels, we propose a preprocessing pipeline for the batch of sentences to be assessed. It consists of representation learning, clustering, and reranking. We call this process active annotation. Results We collected more than 10,000 annotations of statements related to selected medical subjects (psychiatry, cholesterol, autism, antibiotics, vaccines, steroids, birth methods, and food allergy testing) for less than US $7000 by employing 9 highly qualified annotators (certified medical professionals), and we release this data set to the general public. We developed an active annotation framework for more efficient annotation of noncredible medical statements. The application of qualitative analysis resulted in a better annotation protocol for our future efforts in data set creation. Conclusions The results of the qualitative analysis support our claims of the efficacy of the presented method.

show abstract

Reliability Prediction for Health-Related Content: A Replicability Study

Cited by 8 publications

References 36 publications

Health Misinformation Detection in the Social Web: An Overview and a Data Science Approach

Health Misinformation Detection in the Social Web: An Overview and a Data Science Approach

Health Misinformation Detection in Web Content

Active Annotation in Evaluating the Credibility of Web-Based Medical Information: Guidelines for Creating Training Data Sets for Machine Learning

Contact Info

Product

Resources

About