With the growing importance of microbiome research, there is increasing evidence that host variation in microbial communities is associated with overall host health. Advancement in genetic sequencing methods for microbiomes has coincided with improvements in machine learning, with important implications for disease risk prediction in humans. One aspect specific to microbiome prediction is the use of taxonomy-informed feature selection. In this review for non-experts, we explore the most commonly used machine learning methods, and evaluate their prediction accuracy as applied to microbiome host trait prediction. Methods are described at an introductory level, and R/Python code for the analyses is provided.
64Expression quantitative trait locus (eQTL) mapping provides a powerful means to identify func-65 tional variants influencing gene expression and disease pathogenesis. We report the identification 66 of cis-eQTLs from 7,051 post-mortem samples representing 44 tissues and 449 individuals as part 67 of the Genotype-Tissue Expression (GTEx) project. We find a cis-eQTL for 88% of all annotated 68 protein-coding genes, with one-third having multiple independent effects. We identify numerous 69 tissue-specific cis-eQTLs, highlighting the unique functional impact of regulatory variation in di-70 verse tissues. By integrating large-scale functional genomics data and state-of-the-art fine-mapping 71 algorithms, we identify multiple features predictive of tissue-specific and shared regulatory effects. 72 We improve estimates of cis-eQTL sharing and effect sizes using allele specific expression across tis-73 sues. Finally, we demonstrate the utility of this large compendium of cis-eQTLs for understanding 74 the tissue-specific etiology of complex traits, including coronary artery disease. The GTEx project 75 provides an exceptional resource that has improved our understanding of gene regulation across 76 tissues and the role of regulatory variation in human genetic diseases. 77 Introduction 78 Genome-wide association studies (GWAS) have identified a wealth of genetic variants associated 79 with complex traits and disease risk. However, characterizing the molecular and cellular mechanisms 80 through which these variants act remains a major challenge that limits our understanding of disease 81 pathogenesis and the development of therapeutic interventions. Expression quantitative trait locus 82 (eQTL) studies provide a systematic approach to characterize the molecular consequences of genetic 83 variation across tissues and cell types 1-4 . Multiple studies have identified eQTLs for thousands of 84 genes 5-7 , providing novel insights into gene regulation and enabling the interpretation of GWAS 85 signals 8-12 . These studies have largely been performed in a few easily accessible cell types and cell 86 lines, precluding interpretation of the systemic and tissue-specific consequences of genetic variation.
87To overcome these limitations, the Genotype Tissue Expression (GTEx) project was designed to 88 identify and characterize eQTLs across a broad range of tissues. During the pilot phase, which 89 focused on nine tissues, the GTEx project highlighted patterns of eQTL tissue-specificity and 90 demonstrated the value of multi-tissue study designs for identifying causal genes and tissues for 91 trait-associated variants 1 . These results indicated that the identification of eQTLs across an even 92 broader range of tissues would drastically improve characterization of the gene-and tissue-specific 93 consequences of genetic variants.
94Here, we report on the discovery of cis-eQTLs across an expanded collection of 44 tissues in 95 the GTEx V6p study. This dataset consists of 7,051 transcriptomes from 449 individuals and 96 4...
Expression quantitative trait locus (eQTL) studies in human liver are crucial for elucidating how genetic variation influences variability in disease risk and therapeutic outcomes and may help guide strategies to obtain maximal efficacy and safety of clinical interventions. Associations between expression microarray and genome-wide genotype data from four human liver eQTL studies (n = 1,183) were analyzed. More than 2.3 million cis-eQTLs for 15,668 genes were *
Peter Hall's work illuminated many aspects of statistical thought, some of which are very well known including the bootstrap and smoothing. However, he also explored many other lesser known aspects of mathematical statistics. This is a survey of one of those areas, initiated by a seminal paper in 2005, on high dimension low sample size asymptotics. An interesting characteristic of that first paper, and of many of the following papers, is that they contain deep and insightful concepts which are frequently surprising and counter-intuitive, yet have mathematical underpinnings which tend to be direct and not difficult to prove.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.