The analysis of microblogging data has been widely used to discover valuable resources for timely identification of critical illness-related incidents and serious epidemics. Despite the numerous efforts made in this field, making an accurate and timely prediction of incidents and outbreaks based on certain clinical symptoms remains a great challenge. Hence, providing an investigative method can be crucial in characterising a disease state. This study proposes a heuristic mechanism by using an unsupervised learning technique to efficiently detect disease incidents and outbreaks from the tweet content. We categorised the types of emotions that are highly linked to a specific disease and its related terminologies. Emotions (anger, fear, sadness, and joy) and diabetes-related terminologies were extracted using the NRC Affect Intensity Lexicon and a part-of-speech tagging tool. A two-cluster solution was established and validated.The classification result showed that K-means clustering with 2 centroids had the highest classification accuracy (96.53%). The relationship between diabetes-related terms (in the form of tweets) and emotions were established and assessed using the association rules mining technique.The results showed that diabetes-related terms were exclusively associated with fear emotions.This study offers a novel mechanism for disease recognition and outbreak detection in microblogs which is useful in making informed decisions about a disease state.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.