Background Internalizing, externalizing, and somatoform disorders are the most common and disabling forms of psychopathology. Our understanding of these clinical problems is limited by a reliance on self-report along with research using small samples. Social media has emerged as an exciting channel for collecting a large sample of longitudinal data from individuals to study psychopathology. Objective This study reported the results of 2 large ongoing studies in which we collected data from Twitter and self-reported clinical screening scales, the Studies of Online Cohorts for Internalizing Symptoms and Language (SOCIAL) I and II. Methods The participants were a sample of Twitter-using adults (SOCIAL I: N=1123) targeted to be nationally representative in terms of age, sex assigned at birth, race, and ethnicity, as well as a sample of college students in the Midwest (SOCIAL II: N=1988), of which 61.78% (1228/1988) were Twitter users. For all participants who were Twitter users, we asked for access to their Twitter handle, which we analyzed using Botometer, which rates the likelihood of an account belonging to a bot. We divided participants into 4 groups: Twitter users who did not give us their handle or gave us invalid handles (invalid), those who denied being Twitter users (no Twitter, only available for SOCIAL II), Twitter users who gave their handles but whose accounts had high bot scores (bot-like), and Twitter users who provided their handles and had low bot scores (valid). We explored whether there were significant differences among these groups in terms of their sociodemographic features, clinical symptoms, and aspects of social media use (ie, platforms used and time). Results In SOCIAL I, most individuals were classified as valid (580/1123, 51.65%), and a few were deemed bot-like (190/1123, 16.91%). A total of 31.43% (353/1123) gave no handle or gave an invalid handle (eg, entered “N/A”). In SOCIAL II, many individuals were not Twitter users (760/1988, 38.23%). Of the Twitter users in SOCIAL II (1228/1988, 61.78%), most were classified as either invalid (515/1228, 41.94%) or valid (484/1228, 39.41%), with a smaller fraction deemed bot-like (229/1228, 18.65%). Participants reported high rates of mental health diagnoses as well as high levels of symptoms, especially in SOCIAL II. In general, the differences between individuals who provided or did not provide their social media handles were small and not statistically significant. Conclusions Triangulating passively acquired social media data and self-reported questionnaires offers new possibilities for large-scale assessment and evaluation of vulnerability to mental disorders. The propensity of participants to share social media handles is likely not a source of sample bias in subsequent social media analytics.
Background: Internalizing, externalizing, and somatoform disorders are the most common and disabling forms of psychopathology. Our understanding of these clinical problems is limited by a reliance on self-report along with research using small samples. Social media has emerged as an exciting venue in which to collect large sample of longitudinal data from individuals to study psychopathology. We report the results of two large ongoing studies in which we collect Twitter data and self-reported clinical screening scales, the Studies of Online Cohorts for Internalizing symptoms and Language (SOCIAL). Methods: Participants were a sample of Twitter-using adults (SOCIAL-I: N= 1123) targeted to be nationally representative in terms of age, sex assigned at birth, and race/ethnicity as well as a sample of college students in the Midwest (SOCIAL-II: N=1988), of which 61% were Twitter users. For all participants who were Twitter users, we asked for access to their Twitter handle which we analyzed with Botometer, an online application rating the likelihood the account belongs to a bot. We divided participants into four groups: 1) Twitter users who did not give us their handle or gave us invalid handles (“Invalid”), 2) those who denied being Twitter users (“No Twitter,” only available for SOCIAL-II), 3) Twitter users who gave their handles but whose account had high bot score (“Bot-like”), and 4) Twitter users who provided their handles and had low bot scores (“Valid”). We explore whether there are significant differences between these groups in terms of their sociodemographic features, clinical symptoms, and aspects of social media use (i.e., platforms used and time). Results: In SOCIAL-I, most individuals were classified as valid (n=580) and few were deemed bot-like (n=190). 353 gave no handle. In SOCIAL-II, many individuals were not Twitter users (n = 760). Of the Twitter users in SOCIAL-II (n = 1, 455), most were classified as either invalid (n = 515) or valid (n = 484), with a smaller fraction deemed bot-like (n = 229). Participants reported high rates of mental health diagnoses as well as high level of symptoms, especially in SOCIAL-II. In general, differences between individuals who provided or did not provide their social media handle were small and not statistically significant. Conclusions: Triangulating passively-acquired social media data and self-reported questionnaires offers new possibilities for large-scale assessment and evaluation of vulnerability to mental disorders. The propensity of participants to share social media handles is not likely a source of sample bias in subsequent social media analytics.
BACKGROUND Internalizing, externalizing, and somatoform disorders are the most common and disabling forms of psychopathology. Our understanding of these clinical problems is limited by a reliance on self-report along with research using small samples. Social media has emerged as an exciting avenue in which to collect large sample of longitudinal data from individuals to study psychopathology. Nonetheless, there are concerns regarding whether people who share their social media data for research are significantly different from people who do not. OBJECTIVE We report the results of two large ongoing studies in which we collect Twitter data and self-reported clinical screening scales, the Studies of Online Cohorts for Internalizing symptoms and Language (SOCIAL). We categorized individuals based on whether they were deemed to have given a valid Twitter account. We described differences in sociodemographic features, clinical symptoms, and aspects of social media use by whether or not individuals gave valid accounts. METHODS Participants were a nationally representative sample of Twitter-using adults (SOCIAL-I: N= 1,121) as well as a sample of college students in the Midwest (SOCIAL-II: N= 2,015), of which 61% were Twitter users. For all participants who were Twitter users, we asked for access to their Twitter handle which we analyzed with BotOMeter, an online application rating the likelihood the account belongs to a bot. We divided participants into four groups: 1) Twitter users who did not allow access to their account (“No handle”), 2) those who denied being Twitter users (“No Twitter,” only available for SOCIAL-II), 3) Twitter users who gave their handles but whose account had high BotOMeter score (“Bot-like”), and 4) Twitter users who provided their handles and had low BotOMeter scores (“Valid”). RESULTS n SOCIAL-I, most individuals were classified as valid (n=580) and few were deemed bot-like (n=190). 351 gave no handle. In SOCIAL-II, many individuals were not Twitter users (n = 760). Of the Twitter users in SOCIAL-II (n = 1, 455), most were classified as either invalid (n = 515) or valid (n = 484), with a smaller fraction deemed bot-like (n = 229). Participants reported high rates of mental health diagnoses as well as high levels of symptoms, especially in SOCIAL-II. In general, differences between individuals who provided or did not provide their social media handle were small and not statistically significant CONCLUSIONS Triangulating passively-acquired social media data and self-reported questionnaires offers avenues for large-scale assessment and evaluation of vulnerability to mental disorders. The propensity of participants to share social media handles is not likely a source of sample bias in subsequent social media analytics
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.