Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Background: Manually analyzing public health-related content from social media provides valuable insights into beliefs, attitudes, and behaviors of individuals, shedding light on trends and patterns that can inform public understanding, policy decisions, targeted interventions, and communication strategies. Unfortunately, the time and effort needed from well-trained human subject matter experts makes extensive manual social media listening unfeasible. Generative large language models (LLMs) have potential for not only summarizing but also interpreting large amounts of text. It is not clear to what extent LLMs can analyze a large set of social media posts at once to glean subtleties of health-related meaning and reasonably report on healthrelated themes.Objective: Assess feasibility of using LLMs for topic model selection or inductive thematic analysis of large contents of social media posts: Can LLMs conduct topic model selection and inductive thematic analysis as effectively as humans did in a prior manual study, or at least reasonably, as judged by subject matter experts? Methods: We asked the same research question and used the same set of social media content for both the LLM selection of relevant topics and the LLM analysis of themes, as was conducted manually in a prior published study about vaccine rhetoric. We compared the results from the prior manual human analyses to results from analyses by the LLMs GPT4-32K, Claudeinstant-100K, and Claude-2-100K. We also assessed if multiple LLMs had equivalent ability and assessed consistency of repeated analysis from each single LLM.Results: Overall, all three LLMs could assess the large corpus of social media posts and summarize content. LLMs generally gave high rankings to the topics chosen previously by humans as most relevant. We reject a null hypothesis (P<0.001, overall comparison) and conclude that these LLMs are more likely to include the human-rated top 5 content areas in their own top rankings than would occur by chance. Regarding theme identification, LLMs identified several themes similar to those identified by humans, with very low hallucination rates. Variability occurred between LLMs and between test runs of an individual LLM. Despite not consistently matching the human-generated themes, subject matter experts found themes generated by the LLMs were still reasonable and relevant.Conclusions: LLMs can effectively and efficiently process large social media health-related datasets. LLMs can extract themes from such data that human subject matter experts deem reasonable. However, we were unable to show that the LLMs we tested can replicate the depth of analysis from human subject matter experts by consistently extracting the same themes from the same JMIR Preprints Deiner et al
Background: Manually analyzing public health-related content from social media provides valuable insights into beliefs, attitudes, and behaviors of individuals, shedding light on trends and patterns that can inform public understanding, policy decisions, targeted interventions, and communication strategies. Unfortunately, the time and effort needed from well-trained human subject matter experts makes extensive manual social media listening unfeasible. Generative large language models (LLMs) have potential for not only summarizing but also interpreting large amounts of text. It is not clear to what extent LLMs can analyze a large set of social media posts at once to glean subtleties of health-related meaning and reasonably report on healthrelated themes.Objective: Assess feasibility of using LLMs for topic model selection or inductive thematic analysis of large contents of social media posts: Can LLMs conduct topic model selection and inductive thematic analysis as effectively as humans did in a prior manual study, or at least reasonably, as judged by subject matter experts? Methods: We asked the same research question and used the same set of social media content for both the LLM selection of relevant topics and the LLM analysis of themes, as was conducted manually in a prior published study about vaccine rhetoric. We compared the results from the prior manual human analyses to results from analyses by the LLMs GPT4-32K, Claudeinstant-100K, and Claude-2-100K. We also assessed if multiple LLMs had equivalent ability and assessed consistency of repeated analysis from each single LLM.Results: Overall, all three LLMs could assess the large corpus of social media posts and summarize content. LLMs generally gave high rankings to the topics chosen previously by humans as most relevant. We reject a null hypothesis (P<0.001, overall comparison) and conclude that these LLMs are more likely to include the human-rated top 5 content areas in their own top rankings than would occur by chance. Regarding theme identification, LLMs identified several themes similar to those identified by humans, with very low hallucination rates. Variability occurred between LLMs and between test runs of an individual LLM. Despite not consistently matching the human-generated themes, subject matter experts found themes generated by the LLMs were still reasonable and relevant.Conclusions: LLMs can effectively and efficiently process large social media health-related datasets. LLMs can extract themes from such data that human subject matter experts deem reasonable. However, we were unable to show that the LLMs we tested can replicate the depth of analysis from human subject matter experts by consistently extracting the same themes from the same JMIR Preprints Deiner et al
BACKGROUND Manually analyzing public health-related content from social media provides valuable insights into beliefs, attitudes, and behaviors of individuals, shedding light on trends and patterns that can inform public understanding, policy decisions, targeted interventions, and communication strategies. Unfortunately, the time and effort needed from well-trained human subject matter experts makes extensive manual social media listening unfeasible. Generative large language models (LLMs) have potential for not only summarizing but also interpreting large amounts of text. It is not clear to what extent LLMs can analyze a large set of social media posts at once to glean subtleties of health-related meaning and reasonably report on health-related themes. OBJECTIVE Assess feasibility of using LLMs for topic model selection or inductive thematic analysis of large contents of social media posts: Can LLMs conduct topic model selection and inductive thematic analysis as effectively as humans did in a prior manual study, or at least reasonably, as judged by subject matter experts? METHODS We asked the same research question and used the same set of social media content for both the LLM selection of relevant topics and the LLM analysis of themes, as was conducted manually in a prior published study about vaccine rhetoric. We compared the results from the prior manual human analyses to results from analyses by the LLMs GPT4-32K, Claude-instant-100K, and Claude-2-100K. We also assessed if multiple LLMs had equivalent ability and assessed consistency of repeated analysis from each single LLM. RESULTS Overall, all three LLMs could assess the large corpus of social media posts and summarize content. LLMs generally gave high rankings to the topics chosen previously by humans as most relevant. We reject a null hypothesis (P<0.001, overall comparison) and conclude that these LLMs are more likely to include the human-rated top 5 content areas in their own top rankings than would occur by chance. Regarding theme identification, LLMs identified several themes similar to those identified by humans, with very low hallucination rates. Variability occurred between LLMs and between test runs of an individual LLM. Despite not consistently matching the human-generated themes, subject matter experts found themes generated by the LLMs were still reasonable and relevant. CONCLUSIONS LLMs can effectively and efficiently process large social media health-related datasets. LLMs can extract themes from such data that human subject matter experts deem reasonable. However, we were unable to show that the LLMs we tested can replicate the depth of analysis from human subject matter experts by consistently extracting the same themes from the same data. There is vast potential, once better validated, for automated LLM-based real-time social listening for common and rare health conditions, informing public health understanding of the public’s interests and concerns and determining the public’s ideas to address them. CLINICALTRIAL not applicable
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.