Background Previous research has shown the feasibility of using machine learning models trained on social media data from a single platform (eg, Facebook or Twitter) to distinguish individuals either with a diagnosis of mental illness or experiencing an adverse outcome from healthy controls. However, the performance of such models on data from novel social media platforms unseen in the training data (eg, Instagram and TikTok) has not been investigated in previous literature. Objective Our study examined the feasibility of building machine learning classifiers that can effectively predict an upcoming psychiatric hospitalization given social media data from platforms unseen in the classifiers’ training data despite the preliminary evidence on identity fragmentation on the investigated social media platforms. Methods Windowed timeline data of patients with a diagnosis of schizophrenia spectrum disorder before a known hospitalization event and healthy controls were gathered from 3 platforms: Facebook (254/268, 94.8% of participants), Twitter (51/268, 19% of participants), and Instagram (134/268, 50% of participants). We then used a 3 × 3 combinatorial binary classification design to train machine learning classifiers and evaluate their performance on testing data from all available platforms. We further compared results from models in intraplatform experiments (ie, training and testing data belonging to the same platform) to those from models in interplatform experiments (ie, training and testing data belonging to different platforms). Finally, we used Shapley Additive Explanation values to extract the top predictive features to explain and compare the underlying constructs that predict hospitalization on each platform. Results We found that models in intraplatform experiments on average achieved an F1-score of 0.72 (SD 0.07) in predicting a psychiatric hospitalization because of schizophrenia spectrum disorder, which is 68% higher than the average of models in interplatform experiments at an F1-score of 0.428 (SD 0.11). When investigating the key drivers for divergence in construct validities between models, an analysis of top features for the intraplatform models showed both low predictive feature overlap between the platforms and low pairwise rank correlation (<0.1) between the platforms’ top feature rankings. Furthermore, low average cosine similarity of data between platforms within participants in comparison with the same measurement on data within platforms between participants points to evidence of identity fragmentation of participants between platforms. Conclusions We demonstrated that models built on one platform’s data to predict critical mental health treatment outcomes such as hospitalization do not generalize to another platform. In our case, this is because different social media platforms consistently reflect different segments of participants’ identities. With the changing ecosystem of social media use among different demographic groups and as web-based identities continue to become fragmented across platforms, further research on holistic approaches to harnessing these diverse data sources is required.
BACKGROUND Previous research has shown the feasibility of utilizing social media data from a singular platform (e.g., Facebook or Twitter) in distinguishing individuals with a diagnosis of mental illness or experiencing an adverse outcome from healthy volunteers. However, the performance of these models on data from other social media platforms unseen in the training data (e.g., Instagram, TikTok) have not been investigated. OBJECTIVE This study aims to explore if online identities fragmented across social media platforms, models would have better testing performance on data from already seen social media platforms, in comparison to unseen social media platforms. It also aims to explain such discrepancies in performances if they are found. METHODS Windowed timeline data from three platforms with clinically-verified labels of hospitalization among patients with a diagnosis of schizophrenia was gathered: Facebook (N = 254), Twitter (N = 54), and Instagram (N = 124). Then, we utilized a 3 x 3 combinatorial binary classification design to test model’s performance on testing data from all available platforms. We further compared results from models within intra-platform experiments (i.e., training and testing data belongs to the same platform) to models within inter-platform experiments (i.e., training and testing data belongs to the different platforms). Finally, we utilized SHapley Additive exPlanations (SHAP) to extract top predictive features to explain the underlying constructs that predict hospitalization on each platform. RESULTS We found that models within intra-platform experiments on average achieved an F1-score of 0.72 in detection a psychiatric hospitalization due to schizophrenia, which is 68% higher compared to the average of models within inter-platform experiments at an F1-score of 0.428. We also found that by combining training data of all three platforms, a slight improvement of 0.5% was observed on the testing sets on average, compared to original intra-platform models. An analysis of top features for the intra-platform models shows low predictive feature overlap between the platforms, with ‘anger’ being an unique top feature for Facebook while ‘sad’ being an unique top feature for Instagram. CONCLUSIONS We demonstrated models built on one platform’s data to predict critical mental health treatment outcomes, such as a hospitalization, may not generalize to another, because each platform offers different construct validity. However, combining data from multiple platforms together may offer a more comprehensive view of a patient’s state and situation, and therefore fare better in relapse prediction. With the changing ecosystem of social media use among different demographic groups and as online identities continue to get fragmented across platforms, further research on holistic approaches to harnessing these diverse data sources is required.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.