Rare genetic variants make significant contributions to human diseases. Compared to common variants, rare variants have larger effect sizes and are generally free of linkage disequilibrium (LD), which makes it easier to identify causal variants. Numerous methods have been developed to analyze rare variants in a gene or region in association studies, with the goal of finding risk genes by aggregating information of all variants of a gene. These methods, however, often make unrealistic assumptions, e.g. all rare variants in a risk gene would have non-zero effects. In practice, current methods for gene-based analysis often fail to show any advantage over simple single-variant analysis. In this work, we develop a Bayesian method: MIxture model based Rare variant Analysis on GEnes (MIRAGE). MIRAGE captures the heterogeneity of variant effects by treating all variants of a gene as a mixture of risk and non-risk variants, and models the prior probabilities of being risk variants as function of external information of variants, such as allele frequencies and predicted deleterious effects. MIRAGE uses an empirical Bayes approach to estimate these prior probabilities by combining information across genes. We demonstrate in both simulations and analysis of an exome-sequencing dataset of Autism, that MIRAGE significantly outperforms current methods for rare variant analysis. In particular, the top genes identified by MIRAGE are highly enriched with known or plausible Autism risk genes. Our results highlight several novel Autism genes with high Bayesian posterior probabilities and functional connections with Autism. MIRAGE is available at https://xinhe-lab.github.io/mirage.
Introduction 1Genome-wide association studies (GWAS) have successfully identified thousands of loci associated with 2 human complex traits [1][2][3]. However, in most of these loci, the causal variants and their target genes 3 remain unknown. Additionally, most common variants (with minor allele frequency greater than 5%) 4 discovered by GWAS have small effect sizes, modifying disease risk by less than two fold [2,3].
5Sequencing studies focusing on rare variants have the potential to improve our understanding of complex 6 diseases beyond GWAS. Because of purifying selection, deleterious variants with large effects on disease 7 risks tend to be rare in the population, as seen in the cases of many Mendelian diseases [4][5][6][7].Furthermore, linkage disequilibrium is much weaker for rare variants, making it less complicated to 9 fine-map causal variants. Exome sequencing studies have particular advantages because of their 10 relatively low costs, and the ability to directly implicate risk genes [8].
11Statistical association tests for individual rare variants are usually under-powered due to their low allele 12 frequency. This poses a significant challenge for rare variant analysis. A natural strategy is to aggregate 13 all rare variants in a genomic region or gene, to test the collective association of the region or gene with 14 phenotype [9]. Over the pas...