Allele specific expression (ASE) analysis robustly measures cis regulatory effects. Here, we present a vast ASE resource generated from the GTEx v8 release, containing 15,253 samples spanning 54 human tissues for a total of 431 million measurements of ASE at the SNP-level and 153 million measurements at the haplotype-level. In addition, we developed an extension of our tool phASER that allows effect sizes of cis regulatory variants to be estimated using haplotypelevel ASE data. This ASE resource is the largest to date and we are able to make haplotype-level data publicly available. We anticipate that the availability of this resource will enable future studies of regulatory variation across human tissues.
BackgroundAllele specific expression (ASE), or allelic expression analysis is a powerful technique that can be used to measure the expression of gene alleles relative to one another within single individuals. This makes it well suited to measure cis-acting regulatory variation using imbalance between alleles in heterozygous individuals ( Fig. 1a) (Mohammadi, 2017). ASE analysis can capture both common cis-regulatory variation, for example, expression quantitative trait loci (eQTL), and rare regulatory variation (GTEx Consortium, 2017). It can also be used to measure allele-specific epigenetic effects such as parent of origin imprinting (Baran, 2015).In practice, ASE analysis uses RNA-seq reads that overlap heterozygous single nucleotide polymorphisms (SNPs), where the SNP can be used to assign the read to an allele.These heterozygous SNPs capture the cumulative effects of cis-regulatory variation acting on each allele. In some cases, these effects can be caused by the SNPs being used to measure ASE themselves, for example, stop gain variants that cause nonsense-mediated decay (NMD; Rivas, 2015), but often they simply capture effects of other cis-acting variation. Traditionally, a single SNP has been used to measure ASE, by taking the SNP with the highest coverage per gene. However, as a result of improvements in genome phasing, data can be aggregated across SNPs to produce estimates of allelic expression at the haplotype-level (Fig. 1b). We have previously developed a tool, phASER, which does this systematically, in a way that uses the information contained within reads to improve phasing, while preventing double counting of reads across SNPs to improve the quality of data generated (Castel, 2016).In this work, we present an ASE resource generated using the Genotype Tissue Expression (GTEx) version 8 data release comprising RNA-seq data from 54 tissues and 838 individuals, for a total of 15,253 samples. We generated both SNP-level and haplotype-level ASE data. While the SNP-level data is available to approved users through dbGaP, the haplotype-level data does not contain identifiable information, and we were thus able to make it publicly available on the GTEx portal.
Results and DiscussionBoth SNP-level and haplotype-level ASE data were generated for each GTEx sample using current best practices, both with and w...