Two decades after the discovery of the first animal microRNA (miRNA), the number of miRNAs in animal genomes remains a vexing question. Here, we report findings from analyzing 1,323 short RNA sequencing samples (RNA-seq) from 13 different human tissue types. Using stringent thresholding criteria, we identified 3,707 statistically significant novel mature miRNAs at a false discovery rate of ≤0.05 arising from 3,494 novel precursors; 91.5% of these novel miRNAs were identified independently in 10 or more of the processed samples. Analysis of these novel miRNAs revealed tissue-specific dependencies and a commensurate low Jaccard similarity index in intertissue comparisons. Of these novel miRNAs, 1,657 (45%) were identified in 43 datasets that were generated by cross-linking followed by Argonaute immunoprecipitation and sequencing (Ago CLIP-seq) and represented 3 of the 13 tissues, indicating that these miRNAs are active in the RNA interference pathway. Moreover, experimental investigation through stemloop PCR of a random collection of newly discovered miRNAs in 12 cell lines representing 5 tissues confirmed their presence and tissue dependence. Among the newly identified miRNAs are many novel miRNA clusters, new members of known miRNA clusters, previously unreported products from uncharacterized arms of miRNA precursors, and previously unrecognized paralogues of functionally important miRNA families (e.g., miR-15/107). Examination of the sequence conservation across vertebrate and invertebrate organisms showed 56.7% of the newly discovered miRNAs to be human-specific whereas the majority (94.4%) are primate lineage-specific. Our findings suggest that the repertoire of human miRNAs is far more extensive than currently represented by public repositories and that there is a significant number of lineage-and/or tissue-specific miRNAs that are uncharacterized.
SignificanceMicroRNAs (miRNAs) are small ∼22-nt RNAs that are important regulators of posttranscriptional gene expression. Since their initial discovery, they have been shown to be involved in many cellular processes, and their misexpression is associated with disease etiology. Currently, nearly 2,800 human miRNAs are annotated in public repositories. A key question in miRNA research is how many miRNAs are harbored by the human genome. To answer this question, we examined 1,323 short RNA sequence samples and identified 3,707 novel miRNAs, many of which are human-specific and tissue-specific. Our findings suggest that the human genome expresses a greater number of miRNAs than has previously been appreciated and that many more miRNA molecules may play key roles in disease etiology.