PurposeThe purpose of the paper is to study multiple viewpoints which are required to access the more informative similarity features among the tweets documents, which is useful for achieving the robust tweets data clustering results.Design/methodology/approachLet “N” be the number of tweets documents for the topics extraction. Unwanted texts, punctuations and other symbols are removed, tokenization and stemming operations are performed in the initial tweets pre-processing step. Bag-of-features are determined for the tweets; later tweets are modelled with the obtained bag-of-features during the process of topics extraction. Approximation of topics features are extracted for every tweet document. These set of topics features of N documents are treated as multi-viewpoints. The key idea of the proposed work is to use multi-viewpoints in the similarity features computation. The following figure illustrates multi-viewpoints based cosine similarity computation of the five tweets documents (here N = 5) and corresponding documents are defined in projected space with five viewpoints, say, v1,v2, v3, v4, and v5. For example, similarity features between two documents (viewpoints v1, and v2) are computed concerning the other three multi-viewpoints (v3, v4, and v5), unlike a single viewpoint in traditional cosine metric.FindingsHealthcare problems with tweets data. Topic models play a crucial role in the classification of health-related tweets with finding topics (or health clusters) instead of finding term frequency and inverse document frequency (TF–IDF) for unlabelled tweets.Originality/valueTopic models play a crucial role in the classification of health-related tweets with finding topics (or health clusters) instead of finding TF-IDF for unlabelled tweets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.