Despite the potential importance of genetic variation on the X chromosome, it is often omitted in disease association studies. The exclusion of the X chromosome has also propagated into the post-GWAS era, as transcriptome-wide association studies (TWAS) also ignore the X due to the lack of adequate models of X chromosome gene expression. In this work, we trained elastic net penalized models in the brain cortex and whole blood using whole genome sequencing (WGS) and RNA-seq data. To make generalizable recommendations, we evaluated multiple modeling strategies in a homogeneous study population of 175 whole blood samples for 600 genes, and 126 brain cortex samples for 766 genes. SNPs (MAF>0.05) within the gene's two megabase flanking window were used to train the tissue-specific model of each gene. We tuned the shrinkage parameter and evaluated the model performance with nested cross-validation. Across different mixing parameters, sample sex, and tissue types, we trained 511 significant gene models in total, predicting the expression of 229 genes (98 genes in whole blood and 144 genes in brain cortex). The average model coefficient of determination (R^2) was 0.11 (range from 0.03 to 0.34). We tested a range of mixing parameters (0.05, 0.25, 0.5, 0.75, 0.95) for the elastic net regularization, and compared the sex-stratified and sex-combined modeling on the X chromosome. We further investigated genes escaping X chromosome inactivation to determine if their genetic regulation patterns were distinct. Based on our findings, sex-stratified elastic net models with a balanced penalty (50% LASSO and 50% ridge) are the most optimal approach to predict the expression levels of X chromosome genes, regardless of X chromosome inactivation status. The predictive capacity of the optimal models in whole blood and brain cortex was confirmed through validation using DGN and MayoRNAseq temporal cortex cohort data. The R^2 the tissue-specific prediction models ranges from 9.94×10^(-5) to 0.091. These models can be used in Transcriptome-wide Association Studies (TWAS) to identify putative causal X chromosome genes by integrating genotype, imputed gene expression, and phenotype information.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.