Not Just Depressed: Bipolar Disorder Prediction on Reddit

Sekulić, Ivan; Gjurković, Matej; Šnajder, Jan

doi:10.18653/v1/w18-6211

Cited by 41 publications

(42 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Besides MH-specific platforms (Kramer et al, 2004;Vayreda and Antaki, 2009;Bauer et al, 2013;Latalova et al, 2014;Poole et al, 2015;McDonald and Woodward-Kron, 2016;Campbell and Campbell, 2019), blogs (Mandla et al, 2017), and Twitter (Coppersmith et al, 2014;Ji et al, 2015;Saravia et al, 2016;Budenz et al, 2019;Huang et al, 2019), much recent research of user-generated online content in BD has focused on the international online discussion forum Reddit 1 (Gkotsis et al, 2016(Gkotsis et al, , 2017Cohan et al, 2018;Sekulić et al, 2018;Sahota and Sankar, 2019;Yoo et al, 2019).…”

Section: The Online Discussion Forum Redditmentioning

confidence: 99%

“…Online forums have become an increasingly attractive source for research data, enabling non-reactive data collection, where researchers do not influence data creation, at large scale (Fielding et al, 2016). Natural language processing (NLP) research in this area has focused on predicting people at risk of BD (Coppersmith et al, 2014;Cohan et al, 2018;Sekulić et al, 2018). Health researchers have explored the lived experience of BD with qualitative analyses of online posts (Mandla et al, 2017;Sahota and Sankar, 2019).…”

Section: Online Forums As Research Data Sourcementioning

confidence: 99%

See 1 more Smart Citation

Understanding who uses Reddit: Profiling individuals with a self-reported bipolar disorder diagnosis

Jagfeld¹,

Lobban²,

Rayson³

et al. 2021

Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access

View full text Add to dashboard Cite

Recently, research on mental health conditions using public online data, including Reddit, has surged in NLP and health research but has not reported user characteristics, which are important to judge generalisability of findings. This paper shows how existing NLP methods can yield information on clinical, demographic, and identity characteristics of almost 20K Reddit users who self-report a bipolar disorder diagnosis. This population consists of slightly more feminine-than masculinegendered mainly young or middle-aged USbased adults who often report additional mental health diagnoses, which is compared with general Reddit statistics and epidemiological studies. Additionally, this paper carefully evaluates all methods and discusses ethical issues.

show abstract

Section: The Online Discussion Forum Redditmentioning

confidence: 99%

Section: Online Forums As Research Data Sourcementioning

confidence: 99%

Understanding who uses Reddit: Profiling individuals with a self-reported bipolar disorder diagnosis

Jagfeld¹,

Lobban²,

Rayson³

et al. 2021

Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access

View full text Add to dashboard Cite

show abstract

“…They used a combination of N-gram language modeling and Linguistic Inquiry and Word Count (LIWC) [ 11 ]. LIWC has successfully aided in the detection of depression from Twitter activity with an accuracy of 70% [ 12 ] and of bipolar disorder from Reddit posts [ 8 ]. These models typically focus on binary classification of posts with respect to a single disorder or subreddit (eg, was the post made on r/Anxiety or a control subreddit).…”

Section: Introductionmentioning

confidence: 99%

Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study

Low¹,

Rumker²,

Talkar³

et al. 2020

J Med Internet Res

264

144

View full text Add to dashboard Cite

Background The COVID-19 pandemic is impacting mental health, but it is not clear how people with different types of mental health problems were differentially impacted as the initial wave of cases hit. Objective The aim of this study is to leverage natural language processing (NLP) with the goal of characterizing changes in 15 of the world’s largest mental health support groups (eg, r/schizophrenia, r/SuicideWatch, r/Depression) found on the website Reddit, along with 11 non–mental health groups (eg, r/PersonalFinance, r/conspiracy) during the initial stage of the pandemic. Methods We created and released the Reddit Mental Health Dataset including posts from 826,961 unique users from 2018 to 2020. Using regression, we analyzed trends from 90 text-derived features such as sentiment analysis, personal pronouns, and semantic categories. Using supervised machine learning, we classified posts into their respective support groups and interpreted important features to understand how different problems manifest in language. We applied unsupervised methods such as topic modeling and unsupervised clustering to uncover concerns throughout Reddit before and during the pandemic. Results We found that the r/HealthAnxiety forum showed spikes in posts about COVID-19 early on in January, approximately 2 months before other support groups started posting about the pandemic. There were many features that significantly increased during COVID-19 for specific groups including the categories “economic stress,” “isolation,” and “home,” while others such as “motion” significantly decreased. We found that support groups related to attention-deficit/hyperactivity disorder, eating disorders, and anxiety showed the most negative semantic change during the pandemic out of all mental health groups. Health anxiety emerged as a general theme across Reddit through independent supervised and unsupervised machine learning analyses. For instance, we provide evidence that the concerns of a diverse set of individuals are converging in this unique moment of history; we discovered that the more users posted about COVID-19, the more linguistically similar (less distant) the mental health support groups became to r/HealthAnxiety (ρ=–0.96, P<.001). Using unsupervised clustering, we found the suicidality and loneliness clusters more than doubled in the number of posts during the pandemic. Specifically, the support groups for borderline personality disorder and posttraumatic stress disorder became significantly associated with the suicidality cluster. Furthermore, clusters surrounding self-harm and entertainment emerged. Conclusions By using a broad set of NLP techniques and analyzing a baseline of prepandemic posts, we uncovered patterns of how specific mental health problems manifest in language, identified at-risk users, and revealed the distribution of concerns across Reddit, which could help provide better resources to its millions of users. We then demonstrated that textual analysis is sensitive to uncover mental health complaints as they appear in real time, identifying vulnerable groups and alarming themes during COVID-19, and thus may have utility during the ongoing pandemic and other world-changing events such as elections and protests.

show abstract

“…Using computational linguistics, researchers have been able to leverage the widespread use of social media to analyze large, publicly available datasets for identifying linguistic markers of mental illness. To date, unique linguistic markers and patterns have been identified for several psychiatric conditions, such as major depressive disorder (MDD) (De Choudhury et al, 2013;Vedula and Parthasarathy, 2017), general anxiety disorder (GAD) (Shen and Rudzicz, 2017), bipolar disorder (BD) (Huang et al, 2017;Sekulić et al, 2018), eating disorders (ED) (Mohammadi et al, 2019;Naderi et al, 2019), schizophrenia (SZ) (Mitchell et al, 2015;Birnbaum et al, 2017;Zomick et al, 2019), obsessive compulsive disorder (OCD) (Coppersmith et al, 2015a), posttraumatic stress disorder (PTSD) (Coppersmith et al, 2014), as well as others (Coppersmith et al, 2015a). Linguistic findings have spanned various domains of language, including the use of pronouns, emotion words, tentative language, tangentiality, punctuation, and content analysis.…”

Section: Introductionmentioning

confidence: 99%

Detection of Mental Health from Reddit via Deep Contextualized Representations

Jiang¹,

Levitan²,

Zomick³

et al. 2020

Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis

View full text Add to dashboard Cite

We address the problem of automatic detection of psychiatric disorders from the linguistic content of social media posts. We build a large scale dataset of Reddit posts from users with eight disorders and a control user group. We extract and analyze linguistic characteristics of posts and identify differences between diagnostic groups. We build strong classification models based on deep contextualized word representations and show that they outperform previously applied statistical models with simple linguistic features by large margins. We compare user-level and post-level classification performance, as well as an ensembled multiclass model.

show abstract

Not Just Depressed: Bipolar Disorder Prediction on Reddit

Cited by 41 publications

References 25 publications

Understanding who uses Reddit: Profiling individuals with a self-reported bipolar disorder diagnosis

Understanding who uses Reddit: Profiling individuals with a self-reported bipolar disorder diagnosis

Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study

Detection of Mental Health from Reddit via Deep Contextualized Representations

Contact Info

Product

Resources

About