Predictions of COVID-19 case growth and mortality are critical to the decisions of political leaders, businesses, and individuals grappling with the pandemic. This predictive task is challenging due to the novelty of the virus, limited data, and dynamic political and societal responses. We embed a Bayesian time series model and a random forest algorithm within an epidemiological compartmental model for empirically grounded COVID-19 predictions. The Bayesian case model fits a location-specific curve to the velocity (first derivative) of the log transformed cumulative case count, borrowing strength across geographic locations and incorporating prior information to obtain a posterior distribution for case trajectories. The compartmental model uses this distribution and predicts deaths using a random forest algorithm trained on COVID-19 data and population-level characteristics, yielding daily projections and interval estimates for cases and deaths in U.S. states. We evaluated the model by training it on progressively longer periods of the pandemic and computing its predictive accuracy over 21-day forecasts. The substantial variation in predicted trajectories and associated uncertainty between states is illustrated by comparing three unique locations: New York, Colorado, and West Virginia. The sophistication and accuracy of this COVID-19 model offer reliable predictions and uncertainty estimates for the current trajectory of the pandemic in the U.S. and provide a platform for future predictions as shifting political and societal responses alter its course.
In emerging epidemics, early estimates of key epidemiological characteristics of the disease are critical for guiding public policy. In particular, identifying high-risk population subgroups aids policymakers and health officials in combating the epidemic. This has been challenging during the coronavirus disease 2019 (COVID-19) pandemic because governmental agencies typically release aggregate COVID-19 data as summary statistics of patient demographics. These data may identify disparities in COVID-19 outcomes between broad population subgroups, but do not provide comparisons between more granular population subgroups defined by combinations of multiple demographics. We introduce a method that helps to overcome the limitations of aggregated summary statistics and yields estimates of COVID-19 infection and case fatality rates — key quantities for guiding public policy related to the control and prevention of COVID-19 — for population subgroups across combinations of demographic characteristics. Our approach uses pseudo-likelihood based logistic regression to combine aggregate COVID-19 case and fatality data with population-level demographic survey data to estimate infection and case fatality rates for population subgroups across combinations of demographic characteristics. We illustrate our method on California COVID-19 data to estimate test-based infection and case fatality rates for population subgroups defined by gender, age, and race/ethnicity. Our analysis indicates that in California, males have higher test-based infection rates and test-based case fatality rates across age and race/ethnicity groups, with the gender gap widening with increasing age. Although elderly infected with COVID-19 are at an elevated risk of mortality, the test-based infection rates do not increase monotonically with age. The workforce population, especially, has a higher test-based infection rate than children, adolescents, and other elderly people in their 60–80. LatinX and African Americans have higher test-based infection rates than other race/ethnicity groups. The subgroups with the highest 5 test-based case fatality rates are all-male groups with race as African American, Asian, Multi-race, LatinX, and White, followed by African American females, indicating that African Americans are an especially vulnerable California subpopulation.
Predictions of COVID-19 case growth and mortality are critical to the decisions of political leaders, businesses, and individuals grappling with the pandemic. This predictive task is challenging due to the novelty of the virus, limited data, and dynamic political and societal responses. We embed a Bayesian nonlinear mixed model and a random forest algorithm within an epidemiological compartmental model for empirically grounded COVID-19 predictions. The Bayesian case model fits a location-specific curve to the velocity (first derivative) of the transformed cumulative case count, borrowing strength across geographic locations and incorporating prior information to obtain a posterior distribution for case trajectory. The compartmental model uses this distribution and predicts deaths using a random forest algorithm trained on COVID-19 data and population-level characteristics, yielding daily projections and interval estimates for infections and deaths in U.S. states. We evaluate forecasting accuracy on a two-week holdout set, finding that the model predicts COVID-19 cases and deaths well, with a mean absolute scaled error of 0.40 for cases and 0.32 for deaths throughout the two-week evaluation period. The substantial variation in predicted trajectories and associated uncertainty between states is illustrated by comparing three unique locations: New York, Ohio, and Mississippi. The sophistication and accuracy of this COVID-19 model offer reliable predictions and uncertainty estimates for the current trajectory of the pandemic in the U.S. and provide a platform for future predictions as shifting political and societal responses alter its course. Author summaryCOVID-19 models can be roughly classified as mathematical models that simulate disease within a population, including epidemiological compartmental models, or statistical curve-fitting models that fit a function to observed data and extrapolate forward into the future. Bridging this divide, we combine the strengths of curve-fitting statistical models and the structure of epidemiological models, by embedding a Bayesian nonlinear mixed model for case velocity and a machine learning algorithm (random : medRxiv preprint forest) into the framework of a compartmental model. Fusing these models together exploits the particular strengths of each to glean as much information as possible from the currently available data. We also identify the velocity of log cumulative cases as an excellent target for modeling and extrapolating COVID-19 case trajectories. We empirically evaluate the predictive performance of the model and provide predicted trajectories with credible intervals for cumulative confirmed case count, active confirmed infections and COVID-19 deaths for each of the 50 U.S. states. Combining sophisticated data analytic methods with proven epidemiological models offers an empirically grounded strategy for making realistic predictions and quantifying their uncertainty. These predictions indicate substantial variation in the COVID-19 trajectories of U.S. states.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.