Ayush Deva scite author profile

Significance This paper compares the probabilistic accuracy of short-term forecasts of reported deaths due to COVID-19 during the first year and a half of the pandemic in the United States. Results show high variation in accuracy between and within stand-alone models and more consistent accuracy from an ensemble model that combined forecasts from all eligible models. This demonstrates that an ensemble model provided a reliable and comparatively accurate means of forecasting deaths during the COVID-19 pandemic that exceeded the performance of all of the models that contributed to it. This work strengthens the evidence base for synthesizing multiple models to support public-health action.

show abstract

Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the US

Cramer¹,

Ray²,

Lopez³

et al. 2021

Preprint

View full text Add to dashboard Cite

Short-term probabilistic forecasts of the trajectory of the COVID-19 pandemic in the United States have served as a visible and important communication channel between the scientific modeling community and both the general public and decision-makers. Forecasting models provide specific, quantitative, and evaluable predictions that inform short-term decisions such as healthcare staffing needs, school closures, and allocation of medical supplies. Starting in April 2020, the US COVID-19 Forecast Hub (https://covid19forecasthub.org/) collected, disseminated, and synthesized tens of millions of specific predictions from more than 90 different academic, industry, and independent research groups. A multi-model ensemble forecast that combined predictions from dozens of different research groups every week provided the most consistently accurate probabilistic forecasts of incident deaths due to COVID-19 at the state and national level from April 2020 through October 2021. The performance of 27 individual models that submitted complete forecasts of COVID-19 deaths consistently throughout this year showed high variability in forecast skill across time, geospatial units, and forecast horizons. Two-thirds of the models evaluated showed better accuracy than a naïve baseline model. Forecast accuracy degraded as models made predictions further into the future, with probabilistic error at a 20-week horizon 3-5 times larger than when predicting at a 1-week horizon. This project underscores the role that collaboration and active coordination between governmental public health agencies, academic modeling teams, and industry partners can play in developing modern modeling capabilities to support local, state, and federal response to outbreaks. Significance Statement This paper compares the probabilistic accuracy of short-term forecasts of reported deaths due to COVID-19 during the first year and a half of the pandemic in the US. Results show high variation in accuracy between and within stand-alone models, and more consistent accuracy from an ensemble model that combined forecasts from all eligible models. This demonstrates that an ensemble model provided a reliable and comparatively accurate means of forecasting deaths during the COVID-19 pandemic that exceeded the performance of all of the models that contributed to it. This work strengthens the evidence base for synthesizing multiple models to support public health action.

show abstract

The United States COVID-19 Forecast Hub dataset

Cramer

Huang²,

Wang³

et al. 2022

Sci Data

View full text Add to dashboard Cite

Academic researchers, government agencies, industry groups, and individuals have produced forecasts at an unprecedented scale during the COVID-19 pandemic. To leverage these forecasts, the United States Centers for Disease Control and Prevention (CDC) partnered with an academic research lab at the University of Massachusetts Amherst to create the US COVID-19 Forecast Hub. Launched in April 2020, the Forecast Hub is a dataset with point and probabilistic forecasts of incident cases, incident hospitalizations, incident deaths, and cumulative deaths due to COVID-19 at county, state, and national, levels in the United States. Included forecasts represent a variety of modeling approaches, data sources, and assumptions regarding the spread of COVID-19. The goal of this dataset is to establish a standardized and comparable set of short-term forecasts from modeling teams. These data can be used to develop ensemble models, communicate forecasts to the public, create visualizations, compare models, and inform policies regarding COVID-19 mitigation. These open-source data are available via download from GitHub, through an online API, and through R packages.

show abstract

Automated Categorization of Systemic Disease and Duration From Electronic Medical Record System Data Using Finite-State Machine Modeling: Prospective Validation Study

Prashanthi¹,

Deva²,

Vadapalli³

et al. 2020

JMIR Form Res

View full text Add to dashboard Cite

Background One of the major challenges in the health care sector is that approximately 80% of generated data remains unstructured and unused. Since it is difficult to handle unstructured data from electronic medical record systems, it tends to be neglected for analyses in most hospitals and medical centers. Therefore, there is a need to analyze unstructured big data in health care systems so that we can optimally utilize and unearth all unexploited information from it. Objective In this study, we aimed to extract a list of diseases and associated keywords along with the corresponding time durations from an indigenously developed electronic medical record system and describe the possibility of analytics from the acquired datasets. Methods We propose a novel, finite-state machine to sequentially detect and cluster disease names from patients’ medical history. We defined 3 states in the finite-state machine and transition matrix, which depend on the identified keyword. In addition, we also defined a state-change action matrix, which is essentially an action associated with each transition. The dataset used in this study was obtained from an indigenously developed electronic medical record system called eyeSmart that was implemented across a large, multitier ophthalmology network in India. The dataset included patients’ past medical history and contained records of 10,000 distinct patients. Results We extracted disease names and associated keywords by using the finite-state machine with an accuracy of 95%, sensitivity of 94.9%, and positive predictive value of 100%. For the extraction of the duration of disease, the machine’s accuracy was 93%, sensitivity was 92.9%, and the positive predictive value was 100%. Conclusions We demonstrated that the finite-state machine we developed in this study can be used to accurately identify disease names, associated keywords, and time durations from a large cohort of patient records obtained using an electronic medical record system.

show abstract

Interpretability of Epidemiological Models : The Curse of Non-Identifiability

Deva¹,

Shingi²,

Tiwari³

et al. 2021

Preprint

View full text Add to dashboard Cite

Interpretability of epidemiological models is a key consideration, especially when these models are used in a public health setting. Interpretability is strongly linked to the identifiability of the underlying model parameters, i.e., the ability to estimate parameter values with high confidence given observations. In this paper, we define three separate notions of identifiability that explore the different roles played by the model definition, the loss function, the fitting methodology, and the quality and quantity of data. We define an epidemiological compartmental model framework in which we highlight these non-identifiability issues and their mitigation. INTRODUCTIONThe global COVID-19 pandemic has spurred intense interest in epidemiological compartmental models (Thompson, 2020;Brauer, 2008). The use of epidemiological models is driven by four main criteria: expressivity to faithfully capture the disease dynamics; learnability of parameters conditioned on the available data; interpretability to understand the evolution of the pandemic; and generalizability to future scenarios by incorporating additional information.Compartmental models are popular because they are relatively simple and known to be highly expressive and generalizable. However, their interpretability and learnability depend strongly on the alignment between data observations and model complexity. Different choices of model parameters can often lead to (approximately) the same forecast case counts, leading to what is commonly referred to as non-identifiability (Raue et al., 2009;Jacquez & Perry, 1990). The problem of nonidentifiability is only exacerbated with increased model complexity.This lack of identifiability is detrimental because (a) parameter distributions estimated from the observed data tend to be biased with large variances, thus precluding easy interpretation, and (b) nonidentifiable models typically have reduced accuracy on long-time forecasts due to the high parameter variance. This phenomenon is illustrated in Figure 1, where the forecasting errors between a nonidentifiable model and its reparametrized version (later shown to be identifiable) are compared.Non-identifiability in epidemiological models is rooted in the model dynamics, in the fitting loss function and methodology, and in the quality and quantity of data available. Identifiability is typically broadly classified into structural (i.e., purely model-dependent) (Reid, 1977;Massonis et al., 2020) and practical (i.e., data-, loss-and fitting methodology-dependent), with the latter often defined vaguely, and in the context of specific loss functions (Raue et al., 2009; Wieland et al., 2021).Contributions. This paper delineates various general notions of model identifiability that are contextualized in compartmental epidemiological models, including a novel notion of statistical identifiability that depends on the loss function optimized in estimation, and an empirical framework to assess practical identifiability in terms of the highest posterior density intervals. We study...

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ayush Deva

Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the United States

Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the US

The United States COVID-19 Forecast Hub dataset

Automated Categorization of Systemic Disease and Duration From Electronic Medical Record System Data Using Finite-State Machine Modeling: Prospective Validation Study

Interpretability of Epidemiological Models : The Curse of Non-Identifiability

Contact Info

Product

Resources

About