PurposeThe purpose of the paper is to study multiple viewpoints which are required to access the more informative similarity features among the tweets documents, which is useful for achieving the robust tweets data clustering results.Design/methodology/approachLet “N” be the number of tweets documents for the topics extraction. Unwanted texts, punctuations and other symbols are removed, tokenization and stemming operations are performed in the initial tweets pre-processing step. Bag-of-features are determined for the tweets; later tweets are modelled with the obtained bag-of-features during the process of topics extraction. Approximation of topics features are extracted for every tweet document. These set of topics features of N documents are treated as multi-viewpoints. The key idea of the proposed work is to use multi-viewpoints in the similarity features computation. The following figure illustrates multi-viewpoints based cosine similarity computation of the five tweets documents (here N = 5) and corresponding documents are defined in projected space with five viewpoints, say, v1,v2, v3, v4, and v5. For example, similarity features between two documents (viewpoints v1, and v2) are computed concerning the other three multi-viewpoints (v3, v4, and v5), unlike a single viewpoint in traditional cosine metric.FindingsHealthcare problems with tweets data. Topic models play a crucial role in the classification of health-related tweets with finding topics (or health clusters) instead of finding term frequency and inverse document frequency (TF–IDF) for unlabelled tweets.Originality/valueTopic models play a crucial role in the classification of health-related tweets with finding topics (or health clusters) instead of finding TF-IDF for unlabelled tweets.
Twitter is a popular social network for people to share views or opinions on various topics. Many people search for health topics through Twitter; thus, obtaining a vast amount of social health data from Twitter is possible. Topic models are widely used for social health-care data clustering. These models require prior knowledge about the clustering tendency. Determining the number of clusters of given social health data is known as the health cluster tendency. Visual techniques, including visual assessment of the cluster tendency, cosine-based, and multiviewpoint-based cosine similarity features VAT (MVCS-VAT), are used to identify social health cluster tendencies. The recent MVCS-VAT technique is superior to others; however, it is the most expensive technique for big social health data cluster assessment. Thus, this paper aims to enhance the work of the MVCS-VAT using a sampling technique to address the big social health data assessment problem. Experimental is conducted on different health datasets for demonstrating an efficiency of proposed work. Accuracy of social health data clustering is improved at a rate of 5 to 10% in the proposed S-MVCS-VAT when compared to MVCS-VAT. From obtained results, it also proved that the proposed S-MVCS-VAT is a faster and memory efficient for discovering social health data clusters.
Today, predictions for social use are being made in the growing field of social recommended applications. Twitter is a popular platform because it allows millions of users to express their opinions. One of the most emerging areas of study in social mining for large datasets is healthcare prediction. Applying topic models to healthcare data allows for the derivation of predictive insights. An illness or a symptom of a certain health issue is called an ailment. Condition-based evaluation of millions of tweets is performed using the assistance of ailment topic aspect models. The present topic models, which are Latent Semantic Indexing (LSI), Latent Dirichlet Allocation (LDA), and Probabilistic LSI (PLSI), are used for an evaluation of medical outcomes for any one of the ailments' aspects restrictions. Except for adverse effects evaluations of therapies, the state-of-the-art ailments topic aspect model (ATAM) solves the issues and provides healthcare findings for the essential features of ailments data. In order to provide healthcare outcomes over a huge quantity of medical data, ATAM's scalability efficiency is compromised. This paper presents intelligent and highly computational extended ATAM that operates in a distributed environment to solve the scaling issue. Its technique is developed on a multi-node Hadoop system's distributed environment for scalable results. Experiments have been carried out using lakhs of tweets on health and diseases to highlight comparisons between the currently used high-performance models and those recommended.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.