Background Since the onset of the SARS-CoV-2 pandemic, most clinical testing has focused on RT-PCR1. Host epigenome manipulation post coronavirus infection2–4 suggests that DNA methylation signatures may differentiate patients with SARS-CoV-2 infection from uninfected individuals, and help predict COVID-19 disease severity, even at initial presentation. Methods We customized Illumina’s Infinium MethylationEPIC array to enhance immune response detection and profiled peripheral blood samples from 164 COVID-19 patients with longitudinal measurements of disease severity and 296 patient controls. Results Epigenome-wide association analysis revealed 13,033 genome-wide significant methylation sites for case-vs-control status. Genes and pathways involved in interferon signaling and viral response were significantly enriched among differentially methylated sites. We observe highly significant associations at genes previously reported in genetic association studies (e.g.IRF7, OAS1). Using machine learning techniques, models built using sparse regression yielded highly predictive findings: cross-validated best fit AUC was 93.6% for case-vs-control status, and 79.1%, 80.8%, and 84.4% for hospitalization, ICU admission, and progression to death, respectively. Conclusions In summary, the strong COVID-19-specific epigenetic signature in peripheral blood driven by key immune-related pathways related to infection status, disease severity, and clinical deterioration provides insights useful for diagnosis and prognosis of patients with viral infections.
SNP heritability of a trait is measured as the proportion of total variance explained by the additive effects of genome-wide single nucleotide polymorphisms (SNPs). Linear mixed models are routinely used to estimate SNP heritability for many complex traits, which requires estimation of a genetic relationship matrix (GRM) among individuals. Heritability is usually estimated by the restricted maximum likelihood (REML) or method of moments (MOM) approaches such as Haseman-Elston (HE) regression. The common practice of accounting for such population substructure is to adjust for the top few principal components of the GRM as covariates in the linear mixed model. This can get computationally very intensive on large biobank-scale datasets. Here we propose an MOM approach for estimating SNP heritability in presence of population substructure. Our proposed method is computationally scalable on biobank datasets and gives an asymptotically unbiased estimate of heritability in presence of discrete substructures. It introduces the adjustments for population stratification in a second-order estimating equation. It allows these substructures to vary in their SNP allele frequencies and in their trait distributions (means and variances) while the heritability is assumed to be the same across these substructures. Through extensive simulation studies and the application on 7 quantitative traits in the UK Biobank cohort, we demonstrate that our proposed method performs well in the presence of population substructure and much more computationally efficient than existing approaches.
Objective Multiplex immunohistochemistry (mIHC) and multiplexed ion beam imaging (MIBI) images are usually phenotyped using a manual thresholding process. The thresholding is prone to biases, especially when examining multiple images with high cellularity. Results Unsupervised cell-phenotyping methods including PhenoGraph, flowMeans, and SamSPECTRAL, primarily used in flow cytometry data, often perform poorly or need elaborate tuning to perform well in the context of mIHC and MIBI data. We show that, instead, semi-supervised cell clustering using Random Forests, linear and quadratic discriminant analysis are superior. We test the performance of the methods on two mIHC datasets from the University of Colorado School of Medicine and a publicly available MIBI dataset. Each dataset contains a bunch of highly complex images.
With the advent of high throughput genetic data, there have been attempts to estimate heritability from genome-wide SNP data on a cohort of distantly related individuals using linear mixed model (LMM). Fitting such an LMM in a large scale cohort study, however, is tremendously challenging due to its high dimensional linear algebraic operations. In this paper, we propose a new method named PredLMM approximating the aforementioned LMM motivated by the concepts of genetic coalescence and Gaussian predictive process. PredLMM has substantially better computational complexity than most of the existing LMM based methods and thus, provides a fast alternative for estimating heritability in large scale cohort studies. Theoretically, we show that under a model of genetic coalescence, the limiting form of our approximation is the celebrated predictive process approximation of large Gaussian process likelihoods that has well-established accuracy standards. We illustrate our approach with extensive simulation studies and use it to estimate the heritability of multiple quantitative traits from the UK Biobank cohort.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.