BackgroundInfectious diseases are one of the primary healthcare problems worldwide, leading to millions of deaths annually. To develop effective control and prevention strategies, we need reliable computational tools to understand disease dynamics and to predict future cases. These computational tools can be used by policy makers to make more informed decisions.Methodology/Principal findingsIn this study, we developed a computational framework based on Gaussian processes to perform spatiotemporal prediction of infectious diseases and exploited the special structure of similarity matrices in our formulation to obtain a very efficient implementation. We then tested our framework on the problem of modeling Crimean–Congo hemorrhagic fever cases between years 2004 and 2015 in Turkey.Conclusions/SignificanceWe showed that our Gaussian process formulation obtained better results than two frequently used standard machine learning algorithms (i.e., random forests and boosted regression trees) under temporal, spatial, and spatiotemporal prediction scenarios. These results showed that our framework has the potential to make an important contribution to public health policy makers.
a b s t r a c tObjectives: We aimed to develop a prospective prediction tool on CrimeaneCongo haemorrhagic fever (CCHF) to identify geographic regions at risk. The tool could support public health decision-makers in implementation of an effective control strategy in a timely manner. Methods: We used monthly surveillance data between 2004 and 2015 to predict case counts between 2016 and 2017 prospectively. The Turkish nationwide surveillance data set collected by the Ministry of Health contained 10 411 confirmed CCHF cases. We collected potential explanatory covariates about climate, land use, and animal and human populations at risk to capture spatiotemporal transmission dynamics. We developed a structured Gaussian process algorithm and prospectively tested this tool predicting the future year's cases given past years' cases. Results: We predicted the annual cases in 2016 and 2017 as 438 and 341, whereas the observed cases were 432 and 343, respectively. Pearson's correlation coefficient and normalized root mean squared error values for 2016 and 2017 predictions were (0.83; 0.58) and (0.87; 0.52), respectively. The most important covariates were found to be the number of settlements with fewer than 25 000 inhabitants, latitude, longitude and potential evapotranspiration (evaporation and transpiration). Conclusions: Main driving factors of CCHF dynamics were human population at risk in rural areas, geographical dependency and climate effect on ticks. Our model was able to prospectively predict the numbers of CCHF cases. Our proof-of-concept study also provided insight for understanding possible mechanisms of infectious diseases and found important directions for practice and policy to combat against emerging infectious diseases. Ç. Ak, Clin Microbiol Infect 2020;26:123.e1e123.e7
Breast cancers are known to be driven by the transcription factor estrogen receptor and its ligand estrogen. While the receptor's cis-binding elements are known to vary between tumors, heterogeneity of hormone signaling at a single-cell level is unknown. In this study, we systematically tracked estrogen response across time at a single-cell level in multiple cell line and organoid models. To accurately model these changes, we developed a computational tool (TITAN) that quantifies signaling gradients in single-cell datasets. Using this approach, we found that gene expression response to estrogen is non-uniform, with distinct cell groups expressing divergent transcriptional networks. Pathway analysis suggested the two most distinct signatures are driven separately by ER and FOXM1. We observed that FOXM1 was indeed activated by phosphorylation upon estrogen stimulation and silencing of FOXM1 attenuated the relevant gene signature. Analysis of scRNA-seq data from patient samples confirmed the existence of these divergent cell groups, with the FOXM1 signature predominantly found in ER negative cells. Further, multi-omic single-cell experiments indicated that the different cell groups have distinct chromatin accessibility states. Our results provide a comprehensive insight into ER biology at the single-cell level and potential therapeutic strategies to mitigate resistance to therapy.
The impact of COVID-19 across the United States (US) has been heterogeneous, with rapid spread and greater mortality in some areas compared with others. We used geographically-linked data to test the hypothesis that the risk for COVID-19 was defined by location and sought to define which demographic features were most closely associated with elevated COVID-19 spread and mortality. We leveraged geographically-restricted social, economic, political, and demographic information from US counties to develop a computational framework using structured Gaussian process to predict county-level case and death counts during the pandemic’s initial and nationwide phases. After identifying the most predictive information sources by location, we applied an unsupervised clustering algorithm and topic modeling to identify groups of features most closely associated with COVID-19 spread. Our model successfully predicted COVID-19 case counts of unseen locations after examining case counts and demographic information of neighboring locations, with overall Pearson’s correlation coefficient and the proportion of variance explained as 0.96 and 0.84 during the initial phase and 0.95 and 0.87 during the nationwide phase, respectively. Aside from population metrics, presidential vote margin was the most consistently selected spatial feature in our COVID-19 prediction models. Urbanicity and 2020 presidential vote margins were more predictive than other demographic features. Models trained using death counts showed similar performance metrics. Topic modeling showed that counties with similar socioeconomic and demographic features tended to group together, and some of these feature sets were associated with COVID-19 dynamics. Clustering of counties based on these feature groups found by topic modeling revealed groups of counties that experienced markedly different COVID-19 spread. We conclude that topic modeling can be used to group similar features and identify counties with similar features in epidemiologic research.
Background The impact of COVID-19 across the United States has been heterogeneous, with some areas demonstrating more rapid spread and greater mortality than others. We used geographically-linked data to test the hypothesis that the risk for COVID-19 is spatially defined and sought to define which features are most closely associated with elevated COVID-19 spread and mortality. Methods Leveraging geographically-restricted social, economic, political, and demographic information from U.S. counties, we developed a computational framework using structured Gaussian processing to predict county-level case and death counts during both the initial and the nationwide phases of the pandemic. After identifying the most predictive spatial features, we applied an unsupervised clustering algorithm, topic modelling, to identify groups of features that are most closely associated with COVID-19 spread. Findings We found that the inclusion of spatial features modeled case counts very well, with overall Pearson's correlation coefficient (PCC) and 𝑅2 R2 of 0.96 and 0.84 during the initial phase and 0.95 and 0.87, respectively, during the nationwide phase. The most frequently selected features were associated with urbanicity and 2020 presidential vote margins. When trained using death counts, models revealed similar performance metrics, with the addition of aging metrics to those most frequently selected. Topic modeling showed that counties with similar socioeconomic and demographic features tended to group together, and some feature sets were associated with COVID-19 dynamics. Unsupervised clustering of counties based on these topics revealed groups of counties that experienced markedly different COVID-19 spread. Interpretation Spatial features explained most of the variability in COVID-19 dynamics between counties. Topic modeling can be used to group collinear features and identify counties with similar features in epidemiologic research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.