BACKGROUND
Mental health disorders are currently the main contributor to poor quality of life and years lived with disability. Symptoms common to many mental health disorders lead to impairments or changes in the use of language, which are observable in the routine use of social media. Detection of these linguistic cues has been explored throughout the last quarter-century, but interest and methodological development have burgeoned following the COVID-19 pandemic. The next decade may see the development of reliable methods for predicting mental health status using social media data. This might have implications for clinical practice and public health policy, particularly in the context of early intervention in mental health care.
OBJECTIVE
This study examines the state of the art in methods for predicting mental health statuses of social media users. Our focus is the development of AI-driven methods, particularly Natural Language Processing (NLP), for analyzing large volumes of written text. We also detail constraints affecting research in this area. These include the dearth of high-quality public data sets for methodological benchmarking and the need to adopt ethical and privacy frameworks acknowledging the stigma and vulnerability of those affected by mental illness.
METHODS
A Google Scholar search yielded peer-reviewed articles dated between 1999 and 2024. We manually grouped the articles by four primary areas of interest: data sets on social media and mental health, methods for predicting mental health status, longitudinal analyses on mental health, and ethical aspects on the data and analysis of mental health. Selected articles from these groups formed our narrative review.
RESULTS
Larger data sets where precise dates of subjects’ diagnoses are needed to support the development of methods for predicting mental health status, particularly in severe disorders such as schizophrenia. Inviting participants to donate their social media data for research purposes could help overcome widespread ethical and privacy concerns. In any event, multimodal methods for predicting mental health status appear likely to provide advancements that may not be achievable using NLP alone.
CONCLUSIONS
Multimodal methods for predicting mental health status from social media data need to be further developed before they may be considered for adoption in health care, medical support, or as consumer-facing products. For this to be achieved, more high-quality social media data sets need to be made available and privacy concerns regarding the use of this data must be formally addressed. Also, a review of literature studying the effects of social media use on a user’s depression and anxiety is merited.