Background
Substance misuse presents significant global public health challenges. Understanding transitions between substance types and the timing of shifts to polysubstance use is vital to developing effective prevention and recovery strategies. The gateway hypothesis suggests that high-risk substance use is preceded by lower-risk substance use. However, the source of this correlation is hotly contested. While some claim that low-risk substance use causes subsequent, riskier substance use, most people using low-risk substances also do not escalate to higher-risk substances. Social media data hold the potential to shed light on the factors contributing to substance use transitions.
Objective
By leveraging social media data, our study aimed to gain a better understanding of substance use pathways. By identifying and analyzing the transitions of individuals between different risk levels of substance use, our goal was to find specific linguistic cues in individuals’ social media posts that could indicate escalating or de-escalating patterns in substance use.
Methods
We conducted a large-scale analysis using data from Reddit, collected between 2015 and 2019, consisting of over 2.29 million posts and approximately 29.37 million comments by around 1.4 million users from subreddits. These data, derived from substance use subreddits, facilitated the creation of a risk transition data set reflecting the substance use behaviors of over 1.4 million users. We deployed deep learning and machine learning techniques to predict the escalation or de-escalation transitions in risk levels, based on initial transition phases documented in posts and comments. We conducted a linguistic analysis to analyze the language patterns associated with transitions in substance use, emphasizing the role of n-gram features in predicting future risk trajectories.
Results
Our results showed promise in predicting the escalation or de-escalation transition in risk levels, based on the historical data of Reddit users created on initial transition phases among drug-related subreddits, with an accuracy of 78.48% and an F1-score of 79.20%. We highlighted the vital predictive features, such as specific substance names and tools indicative of future risk escalations. Our linguistic analysis showed that terms linked with harm reduction strategies were instrumental in signaling de-escalation, whereas descriptors of frequent substance use were characteristic of escalating transitions.
Conclusions
This study sheds light on the complexities surrounding the gateway hypothesis of substance use through an examination of web-based behavior on Reddit. While certain findings validate the hypothesis, indicating a progression from lower-risk substances such as marijuana to higher-risk ones, a significant number of individuals did not show this transition. The research underscores the potential of using machine learning with social media analysis to predict substance use transitions. Our results point toward future directions for leveraging social media data in substance use research, underlining the importance of continued exploration before suggesting direct implications for interventions.