Background Risky sexual behavior (RSB), the most direct risk factor for sexually transmitted infections (STIs), is common among college students. Thus, identifying relevant risk factors and predicting RSB are important to intervene and prevent RSB among college students. Objective We aim to establish a predictive model for RSB among college students to facilitate timely intervention and the prevention of RSB to help limit STI contraction. Methods We included a total of 8794 heterosexual Chinese students who self-reported engaging in sexual intercourse from November 2019 to February 2020. We identified RSB among those students and attributed it to 4 dimensions: whether contraception was used, whether the contraceptive method was safe, whether students engaged in casual sex or sex with multiple partners, and integrated RSB (which combined the first 3 dimensions). Overall, 126 predictors were included in this study, including demographic characteristics, daily habits, physical and mental health, relationship status, sexual knowledge, sexual education, sexual attitude, and previous sexual experience. For each type of RSB, we compared 8 machine learning (ML) models: multiple logistic regression (MLR), naive Bayes (BYS), linear discriminant analysis (LDA), random forest (RF), gradient boosting machine (GBM), extreme gradient boosting (XGBoost), deep learning (DL), and the ensemble model. The optimal model for both RSB prediction and risk factor identification was selected based on a set of validation indicators. An MLR model was applied to investigate the association between RSB and identified risk factors through ML methods. Results In total, 5328 (60.59%) students were found to have previously engaged in RSB. Among them, 3682 (41.87%) did not use contraception every time they had sexual intercourse, 3602 (40.96%) had previously used an ineffective or unsafe contraceptive method, and 1157 (13.16%) had engaged in casual sex or sex with multiple partners. XGBoost achieved the optimal predictive performance on all 4 types of RSB, with the area under the receiver operator characteristic curve (AUROC) reaching 0.78, 0.72, 0.94, and 0.80 for contraceptive use, safe contraceptive method use, engagement in casual sex or with multiple partners, and integrated RSB, respectively. By ensuring the stability of various validation indicators, the 12 most predictive variables were then selected using XGBoost, including the participants’ relationship status, sexual knowledge, sexual attitude, and previous sexual experience. Through MLR, RSB was found to be significantly associated with less sexual knowledge, more liberal sexual attitudes, single relationship status, and increased sexual experience. Conclusions RSB is prevalent among college students. The XGBoost model is an effective approach to predict RSB and identify corresponding risk factors. This study presented an opportunity to promote sexual and reproductive health through ML models, which can help targeted interventions aimed at different subgroups and the precise surveillance and prevention of RSB among college students through risk probability prediction.
BACKGROUND Risky sexual behavior (RSB), as the most direct risk factor for sexually transmitted infections (STIs), is common among college students. Thus, it is important to intervene and prevent it among college students by identifying relevant risk factors and making predictions. OBJECTIVE We aimed to establish a predictive model for RSB among college students to facilitate timely prevention and intervention before contraction of STIs. METHODS We included a total of 8,290 self-reported heterosexual Chinese students with sexual intercourse experience from November 2019 to February 2020. We identified RSB among those students and attributed it to four dimensions: whether contraception was used; whether the contraceptive method was safe; whether students engaged in casual sex or sex with multiple partners; and integrated RSB, which combined the first three dimensions. For each type, we compared various machine learning (ML) models according to multiple validation indicators and chose the optimal model for both RSB prediction and risk factor identification. RESULTS In total, 4993 (60·2%) students had ever engaged in RSB. Among them, 3422 (41·3%) did not use contraception every time they had sexual intercourse, 3393 (40·93%) had ever used an unsafe contraceptive method, and 1069 (12·9%) had casual sex or sex with multiple partners. Through comparison, the XGBoost (XGB) and gradient boosting machine (GBM) models achieved the optimal predictive performance on integrated RSB, with an area under the receiver operator characteristic curve (AUC) reaching 0·80. Under the condition of ensuring the stability of various validation indicators, the 12 most predictive variables were finally selected by XGB, including participants’ relationship status, sexual knowledge, sexual attitude, and previous sexual experience. CONCLUSIONS RSB is prevalent among college students, and ML is an effective approach to predict RSB and identify corresponding risk factors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.