Introduction
Cigarette smoking continues to pose a threat to public health. Identifying individual risk factors for smoking initiation is essential to further mitigate this epidemic. To our knowledge, no study today has used Machine Learning (ML) techniques to automatically uncover informative predictors of smoking onset among adults using the Population Assessment of Tobacco and Health (PATH) study.
Methods
In this work, we employed Random Forest paired with Recursive Feature Elimination to identify relevant PATH variables that predict smoking initiation among adults who have never smoked at baseline between two consecutive PATH waves. We included all potentially informative baseline variables in wave 1 (wave 4) to predict past 30-day smoking status in wave 2 (wave 5). Using the first and most recent pairs of PATH waves was found sufficient to identify the key risk factors of smoking initiation and test their robustness over time. The eXtreme Gradient Boosting method was employed to test the quality of these selected variables.
Results
As a result, classification models suggested about 60 informative PATH variables among many candidate variables in each baseline wave. With these selected predictors, the resulting models have a high discriminatory power with the area under the Specificity-Sensitivity curves of around 80%. We examined the chosen variables and discovered important features. Across the considered waves, two factors, (i) BMI and (ii) dental/oral health status, robustly appeared as important predictors of smoking initiation, besides other well-established predictors.
Conclusions
Our work demonstrates that ML methods are useful to predict smoking initiation with high accuracy, identify novel smoking initiation predictors, and to enhance our understanding of tobacco use behaviors.
Implications
Understanding individual risk factors for smoking initiation is essential to prevent smoking initiation. With this methodology, a set of the most informative predictors of smoking onset in the PATH data was identified. Besides reconfirming well-known risk factors, the findings suggested additional predictors of smoking initiation that have been overlooked in previous work. More studies that focus on the newly discovered factors (BMI and dental/oral health status,) are needed to confirm their predictive power against the onset of smoking as well as determine the underlying mechanisms.
IntroductionWith the US Food and Drug Administration recently proposing to implement a ban on menthol cigarettes, it is critical to estimate the potential public health effects of such a ban. With high rates of menthol cigarette use and important smoking-related health disparity implications, the impact of the ban on the non-Hispanic black (NHB) population merits strong consideration.MethodsWe apply the previously developed Menthol Smoking and Vaping Model to the NHB population. A status quo scenario is developed using NHB-specific population, smoking and vaping initiation, cessation and death rates. Estimates from a recent expert elicitation on behavioural impacts of a menthol cigarette ban on the NHB population are used to develop a menthol ban scenario implemented in 2021. The public health impacts of the menthol ban are estimated as the difference between smoking and vaping attributable deaths (SVADs) and life years lost (LYLs) in the status quo and the menthol ban scenarios from 2021 to 2060.ResultsUnder the menthol ban scenario, overall smoking is projected to decline by 35.7% in 2026 and by 25.3% in 2060 relative to the status quo scenario. With these reductions, SVADs are estimated to fall by about 18.5% and LYLs by 22.1%, translating to 255 895 premature deaths averted, and 4.0 million life years gained over a 40-year period.ConclusionsA menthol cigarette ban will substantially reduce the smoking-associated health impact on the NHB population, thereby reducing health disparities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.