Somatic mutations in healthy tissues contribute to aging, neurodegeneration, and cancer initiation, yet remain largely uncharacterized. To gain a better understanding of their distribution and functional impacts, we leveraged the genomic information contained in the transcriptome to uniformly call somatic mutations from over 7,500 tissue samples, representing 36 distinct tissues. This catalog, containing over 280,000 mutations, revealed a wide diversity of tissuespecific mutation profiles associated with gene expression levels and chromatin states. We found pervasive negative selection acting on missense and nonsense mutations, except for mutations previously observed in cancer samples, which were under positive selection and were highly enriched in many healthy tissues. These findings reveal fundamental patterns of tissue-specific somatic evolution and shed light on aging and the earliest stages of tumorigenesis.
RESULTS
Somatic mutation calling across 36 non-disease tissues from 547 peopleOur method considers genomic positions where we observed two alleles in the RNA-seq reads and assesses whether they are likely to be bona fide DNA somatic mutations (Fig 1a). We optimized the minimum levels of RNA-seq read depth, sequence quality, and number of reads supporting the variant allele to limit the impact of sequencing errors (Supp. Fig. 1a; see Methods). We then applied extensive filters to eliminate false-positives from biological and technical sources, including RNA editing, sequencing errors, and mapping errors (see Methods ; Fig 1b- Fig. 2a; Supp. Table 1).To validate the method, we compared somatic mutation calls from 105 blood RNA-seq samples to exome DNA sequencing performed on the same samples 23 (Fig. 1d). We observed a false-discovery rate (FDR) of 29% which represents the percentage of somatic mutations called from RNA-seq not having evidence in the corresponding DNA exome-seq sample (Fig. 1d, Supp. Fig. 1c; see methods). This is comparable to the 40% FDR in a previous study that inferred mutations from scRNA-seq in pancreas 22 .After applying the pipeline and filters to RNA-seq data from the GTEx project, we retained a total of 7,584 samples from 36 different tissues and 547 different individuals with no detectable cancer (Supp. Table 2). This resulted in a total of 280,843 unique mutations (Supp. Table 3), most of which were rare across the entire data set (median frequency = 0.026% of samples; Supp. Fig. 2b).We first investigated the factors influencing mutation counts per sample and tissue (see Methods). The main contributor was sequencing depth and to a lesser extent other biological and technical factors (Supp. Fig. 2c, Supp. Table 4). Tissues that have more mutations than expected from sequencing depth include those most often exposed to environmental mutagens or with a high cellular turnover like skin, lung, blood, esophagus mucosa, spleen, liver and small intestine ( Fig. 2a). On the other end of the spectrum are those with low environmental exposure or low cellular turnover such as brain, adre...