Motivation
Allelic imbalance (AI), i.e. the unequal expression of the alleles of the same gene in a single cell, affects a subset of genes in diploid organisms. One prominent example of AI is parental genomic imprinting, which results in parent-of-origin-dependent, mono-allelic expression of a limited number of genes in metatherian and eutherian mammals and in angiosperms. Currently available methods for identifying AI rely on data modeling and come with the associated limitations.
Results
We have designed ISoLDE (Integrative Statistics of alleLe Dependent Expression), a novel nonparametric statistical method that takes into account both AI and the characteristics of RNA-seq data to infer allelic expression bias when at least two biological replicates are available for reciprocal crosses. ISoLDE learns the distribution of a specific test statistic from the data and calls genes ‘allelically imbalanced’, ‘bi-allelically expressed’ or ‘undetermined’. Depending on the number of replicates, predefined thresholds or permutations are used to make calls. We benchmarked ISoLDE against published methods, and showed that ISoLDE compared favorably with respect to sensitivity, specificity and robustness to the number of replicates. Using ISoLDE on different RNA-seq datasets generated from hybrid mouse tissues, we did not discover novel imprinted genes (IGs), confirming the most conservative estimations of IG number.
Availability and implementation
ISoLDE has been implemented as a Bioconductor package available at http://bioconductor.org/packages/ISoLDE/.
Supplementary information
Supplementary data are available at Bioinformatics online.