Background
Suicide risk assessment usually involves an interaction between doctors and patients. However, a significant number of people with mental disorders receive no treatment for their condition due to the limited access to mental health care facilities; the reduced availability of clinicians; the lack of awareness; and stigma, neglect, and discrimination surrounding mental disorders. In contrast, internet access and social media usage have increased significantly, providing experts and patients with a means of communication that may contribute to the development of methods to detect mental health issues among social media users.
Objective
This paper aimed to describe an approach for the suicide risk assessment of Spanish-speaking users on social media. We aimed to explore behavioral, relational, and multimodal data extracted from multiple social platforms and develop machine learning models to detect users at risk.
Methods
We characterized users based on their writings, posting patterns, relations with other users, and images posted. We
also evaluated statistical and deep learning approaches to handle multimodal data for the detection of users with signs of suicidal
ideation (suicidal ideation risk group). Our methods were evaluated over a dataset of 252 users annotated by clinicians. To evaluate
the performance of our models, we distinguished 2 control groups: users who make use of suicide-related vocabulary (focused
control group) and generic random users (generic control group).
Results
We identified significant statistical differences between the textual and behavioral attributes of each of the control
groups compared with the suicidal ideation risk group. At a 95% CI, when comparing the suicidal ideation risk group and the
focused control group, the number of friends (P=.04) and median tweet length (P=.04) were significantly different. The median
number of friends for a focused control user (median 578.5) was higher than that for a user at risk (median 372.0). Similarly, the
median tweet length was higher for focused control users, with 16 words against 13 words of suicidal ideation risk users. Our
findings also show that the combination of textual, visual, relational, and behavioral data outperforms the accuracy of using each
modality separately. We defined text-based baseline models based on bag of words and word embeddings, which were outperformed
by our models, obtaining an increase in accuracy of up to 8% when distinguishing users at risk from both types of control users.
Conclusions
The types of attributes analyzed are significant for detecting users at risk, and their combination outperforms the
results provided by generic, exclusively text-based baseline models. After evaluating the contribution of image-based predictive
models, we believe that our results can be improved by enhancing the models based on textual and relational features. These
methods can be extended and applied to different use cases related to other mental disorders.