Background Much research is being carried out using publicly available Twitter data in the field of public health, but the types of research questions that these data are being used to answer and the extent to which these projects require ethical oversight are not clear. Objective This review describes the current state of public health research using Twitter data in terms of methods and research questions, geographic focus, and ethical considerations including obtaining informed consent from Twitter handlers. Methods We implemented a systematic review, following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, of articles published between January 2006 and October 31, 2019, using Twitter data in secondary analyses for public health research, which were found using standardized search criteria on SocINDEX, PsycINFO, and PubMed. Studies were excluded when using Twitter for primary data collection, such as for study recruitment or as part of a dissemination intervention. Results We identified 367 articles that met eligibility criteria. Infectious disease (n=80, 22%) and substance use (n=66, 18%) were the most common topics for these studies, and sentiment mining (n=227, 62%), surveillance (n=224, 61%), and thematic exploration (n=217, 59%) were the most common methodologies employed. Approximately one-third of articles had a global or worldwide geographic focus; another one-third focused on the United States. The majority (n=222, 60%) of articles used a native Twitter application programming interface, and a significant amount of the remainder (n=102, 28%) used a third-party application programming interface. Only one-third (n=119, 32%) of studies sought ethical approval from an institutional review board, while 17% of them (n=62) included identifying information on Twitter users or tweets and 36% of them (n=131) attempted to anonymize identifiers. Most studies (n=272, 79%) included a discussion on the validity of the measures and reliability of coding (70% for interreliability of human coding and 70% for computer algorithm checks), but less attention was paid to the sampling frame, and what underlying population the sample represented. Conclusions Twitter data may be useful in public health research, given its access to publicly available information. However, studies should exercise greater caution in considering the data sources, accession method, and external validity of the sampling frame. Further, an ethical framework is necessary to help guide future research in this area, especially when individual, identifiable Twitter users and tweets are shared and discussed. Trial Registration PROSPERO CRD42020148170; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=148170
BACKGROUND YouTube has become a popular source of healthcare information reaching an estimated 73% of adults in 2019; approximately 35% of adults in the United States have used the internet to self-diagnose a condition. Public health researchers are therefore incorporating YouTube data in their research, with varying methodologies for sampling, defining measures, and handling ethical concerns. OBJECTIVE To understand the types of public health research being implemented with YouTube data and the methodologies and research ethics processes applied to this research. METHODS We implemented a systematic review of articles that were published in peer reviewed journals in English between January 1, 2006 and October 31, 2019 and concerned public health and social media. We extracted data on yearly publication rate, journal impact factor (IF), sampling methods, outcome types, external validity, measures of popularity, presence of user identifying information, IRB review, and informed consent processes. RESULTS This review includes 119 articles from 88 journals. The number of articles published per year increased from two in 2007 to 16 in 2016 and 2017 and then declined to approximately 10 in 2019. Median IF of the journals publishing these studies has remained below 5.0 since 2009. The most common public health topics studied were in the categories of chronic diseases other than cancers (n=28, 23.5%), infectious diseases (n=20, 16.8%), and substance use (n=19, 16.0%). Most studies used content analysis to describe the themes of videos (n=89, 74.8%), while the remainder reported on the quality or utility of videos (n=35, 29.4%), and public opinion or attitudes about video topics (n=31, 26.1%). Few articles scored poorly for quality metrics (n=22, 18.5%). The quality metric most lacking was “validity of measures” (only 6 of 75 studies [8.0%] achieved this metric), followed by “sufficiently rigorous statistical analysis” (14 of 119 studies [11.8%] achieved this metric). The majority (n=82, 68.9%) of articles made no mention of ethical considerations in study design or data collection. Thirty-three (27.7%) contained identifying information about content creators or video commenters. About a quarter of studies sought IRB approval (n=31, 26.1%), but only one sought informed consent from content creators. CONCLUSIONS We found great interest in using YouTube to answer public health questions as indicated by the quantity of articles and the increase in rate of publication over time. However, more careful consideration of study design and thorough validation of outcome measures will strengthen future studies. Debate about the ethics of social media data usage is ongoing. Concrete guidelines on ethical considerations, especially from IRBs, are needed for social media research. CLINICALTRIAL PROSPERO Registration Number CRD42020148170.
BACKGROUND Much research is being done using publicly available Twitter data in the field of public health, but what types of research questions these data are being used to answer and the extent to which these projects require ethical oversight is not clear. OBJECTIVE To describe the current state of public health research using Twitter data in terms of methods/research questions, geographic focus, and ethical considerations including informed consent of Twitter handlers. METHODS We implemented a systematic review, following PRISMA guidelines, of articles published between January 2006 and October 31, 2019 using Twitter data in secondary analyses for public health research found using standardized search criteria on SOCIndex, PsychInfo, and/or PubMed. Studies were excluded when using Twitter for primary data collection, such as for study recruitment or as part of a dissemination intervention. RESULTS We identified 367 articles that met eligibility criteria. Infectious disease (21.8%) and substance use (18.0%) were the most common topics for these studies, and sentiment mining (61.9%), surveillance (61.0%), and thematic exploration (59.1%) were the most common methodologies employed. About one-third of articles had a global/worldwide geographic focus; another third focused on the United States. The majority (60.5%) of articles used a native Twitter application programming interface (API), and a significant amount of the remainder (27.8%) used a third-party API. Only one third (32.3%) of studies sought IRB approval, while 16.9% included identifying information on Twitter users and/or tweets and 35.7% attempted to anonymize identifiers. Most studies included discussion of the validity of the measures (73.6%) and reliability of coding (69.7% for inter-reliability of human coding and 70.2% for computer algorithm checks), but less attention was paid to the sampling frame, and what underlying population the sample represented. CONCLUSIONS Twitter data may be useful in public health research, given its access to publicly available information. However, studies should exercise greater caution in considering the data sources, accession method, and external validity of the sampling frame. Further, an ethical framework is necessary to help guide future research in this area, especially when individual, identifiable Twitter users and tweets are shared and discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.