Enhancing Seasonal Influenza Surveillance: Topic Analysis of Widely Used Medicinal Drugs Using Twitter Data

Kagashe, Ireneus; Yan, Zhijun; Suheryani, Imran

doi:10.2196/jmir.7393

Cited by 68 publications

(82 citation statements)

References 61 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We queried our in-house Twitter adverse drug reaction database first using the original keywords only and then including the variants. 5 The queried dataset consisted of 7.98 million tweets in total with initially unknown numbers of occurrences of each of these terms. Querying using the original keywords retrieved 5579 tweets.…”

Section: Extrinsic Evaluationmentioning

confidence: 99%

An unsupervised and customizable misspelling generator for mining noisy health-related text sources

Sarker

Gonzalez-Hernandez

2018

Journal of Biomedical Informatics

View full text Add to dashboard Cite

Background Data collection and extraction from noisy text sources such as social media typically rely on keyword-based searching/listening. However, health-related terms are often misspelled in such noisy text sources due to their complex morphology, resulting in the exclusion of relevant data for studies. In this paper, we present a customizable data-centric system that automatically generates common misspellings for complex health-related terms, which can improve the data collection process from noisy text sources. Materials and Methods The spelling variant generator relies on a dense vector model learned from large, unlabeled text, which is used to find semantically close terms to the original/seed keyword, followed by the filtering of terms that are lexically dissimilar beyond a given threshold. The process is executed recursively, converging when no new terms similar (lexically and semantically) to the seed keyword are found. The weighting of intra-word character sequence similarities allows further problem-specific customization of the system. Results On a dataset prepared for this study, our system outperforms the current state-of-the-art medication name variant generator with best F1 − score of 0.69 and F14 − score of 0.78. Extrinsic evaluation of the system on a set of cancer-related terms showed an increase of over 67% in retrieval rate from Twitter posts when the generated variants are included. Discussion Our proposed spelling variant generator has several advantages over the existing spelling variant generators—(i) it is capable of filtering out lexically similar but semantically dissimilar terms, (ii) the number of variants generated is low, as many low-frequency and ambiguous misspellings are filtered out, and (iii) the system is fully automatic, customizable and easily executable. While the base system is fully unsupervised, we show how supervision may be employed to adjust weights for task-specific customizations. Conclusion The performance and relative simplicity of our proposed approach make it a much-needed spelling variant generation resource for health-related text mining from noisy sources. The source code for the system has been made publicly available for research.

show abstract

Section: Extrinsic Evaluationmentioning

confidence: 99%

An unsupervised and customizable misspelling generator for mining noisy health-related text sources

Sarker

Gonzalez-Hernandez

2018

Journal of Biomedical Informatics

View full text Add to dashboard Cite

show abstract

“…Medical concept discovery is the basis of healthcare knowledge discovery strategies such as disease surveillance and adverse drug reaction detection. Healthcare knowledge discovery from social media has been validated as viable in previous works [15,16], and can contribute to the sustainability of public health. Therefore, the adoption of the proposed system can directly or indirectly benefit various participants including health consumers, health service providers, and online healthcare platforms, contributing to the sustainability of the virtualized healthcare industry.…”

Section: Discussionmentioning

confidence: 99%

“…OHCs can also benefit from entity extraction by attracting more participants to engage in the information exchange platforms. Second, medical entity recognition is an essential task in clinical information extraction and medical knowledge discovery [14], and can facilitate a number of healthcare-related applications such as disease surveillance [15] and adverse drug reaction detection [16]. Early detection of disease activity can reduce the impact of certain diseases such as seasonal influenza with a rapid response [17].…”

Section: Introductionmentioning

confidence: 99%

Toward Sustainable Virtualized Healthcare: Extracting Medical Entities from Chinese Online Health Consultations Using Deep Neural Networks

Yang

Gao

2018

Sustainability

View full text Add to dashboard Cite

Increasingly popular virtualized healthcare services such as online health consultations have significantly changed the way in which health information is sought, and can alleviate geographic barriers, time constraints, and medical resource shortage problems. These online patient–doctor communications have been generating abundant amounts of healthcare-related data. Medical entity extraction from these data is the foundation of medical knowledge discovery, including disease surveillance and adverse drug reaction detection, which can potentially enhance the sustainability of healthcare. Previous studies that focus on health-related entity extraction have certain limitations such as demanding tough handcrafted feature engineering, failing to extract out-of-vocabulary entities, and being unsuitable for the Chinese social media context. Motivated by these observations, this study proposes a novel model named CNMER (Chinese Medical Entity Recognition) using deep neural networks for medical entity recognition in Chinese online health consultations. The designed model utilizes Bidirectional Long Short-Term Memory and Conditional Random Fields as the basic architecture, and uses character embedding and context word embedding to automatically learn effective features to recognize and classify medical-related entities. Exploiting the consultation text collected from a prevalent online health community in China, the evaluation results indicate that the proposed method significantly outperforms the related state-of-the-art models that focus on the Chinese medical entity recognition task. We expect that our model can contribute to the sustainable development of the virtualized healthcare industry.

show abstract

“…[25] to predict number of Influenza-related hospital visits. Others extracted topics from tweets to enhance seasonal Influenza surveillance [26] . The system in Ref.…”

Section: Related Workmentioning

confidence: 99%

Tweetluenza: Predicting flu trends from twitter data

Alkouz

Aghbari

Abawajy

2019

Big Data Min. Anal.

View full text Add to dashboard Cite

Health authorities worldwide strive to detect Influenza prevalence as early as possible in order to prepare for it and minimize its impacts. To this end, we address the Influenza prevalence surveillance and prediction problem. In this paper, we develop a new Influenza prevalence prediction model, called Tweetluenza, to predict the spread of the Influenza in real time using cross-lingual data harvested from Twitter data streams with emphases on the United Arab Emirates (UAE). Based on the features of tweets, Tweetluenza filters the Influenza tweets and classifies them into two classes, reporting and non-reporting. To monitor the growth of Influenza, the reporting tweets were employed. Furthermore, a linear regression model leverages the reporting tweets to predict the Influenza-related hospital visits in the future. We evaluated Tweetluenza empirically to study its feasibility and compared the results with the actual hospital visits recorded by the UAE Ministry of Health. The results of our experiments demonstrate the practicality of Tweetluenza, which was verified by the high correlation between the Influenza-related Twitter data and hospital visits due to Influenza. Furthermore, the evaluation of the analysis and prediction of Influenza shows that combining English and Arabic tweets improves the correlation results.

show abstract

Enhancing Seasonal Influenza Surveillance: Topic Analysis of Widely Used Medicinal Drugs Using Twitter Data

Cited by 68 publications

References 61 publications

An unsupervised and customizable misspelling generator for mining noisy health-related text sources

An unsupervised and customizable misspelling generator for mining noisy health-related text sources

Toward Sustainable Virtualized Healthcare: Extracting Medical Entities from Chinese Online Health Consultations Using Deep Neural Networks

Tweetluenza: Predicting flu trends from twitter data

Contact Info

Product

Resources

About