Background
The role of synonymous single-nucleotide variants in human health and disease is poorly understood, yet evidence suggests that this class of “silent” genetic variation plays multiple regulatory roles in both transcription and translation. One mechanism by which synonymous codons direct and modulate the translational process is through alteration of the elaborate structure formed by single-stranded mRNA molecules. While tools to computationally predict the effect of non-synonymous variants on protein structure are plentiful, analogous tools to systematically assess how synonymous variants might disrupt mRNA structure are lacking.
Results
We developed novel software using a parallel processing framework for large-scale generation of secondary RNA structures and folding statistics for the transcriptome of any species. Focusing our analysis on the human transcriptome, we calculated 5 billion RNA-folding statistics for 469 million single-nucleotide variants in 45,800 transcripts. By considering the impact of all possible synonymous variants globally, we discover that synonymous variants predicted to disrupt mRNA structure have significantly lower rates of incidence in the human population.
Conclusions
These findings support the hypothesis that synonymous variants may play a role in genetic disorders due to their effects on mRNA structure. To evaluate the potential pathogenic impact of synonymous variants, we provide RNA stability, edge distance, and diversity metrics for every nucleotide in the human transcriptome and introduce a “Structural Predictivity Index” (SPI) to quantify structural constraint operating on any synonymous variant. Because no single RNA-folding metric can capture the diversity of mechanisms by which a variant could alter secondary mRNA structure, we generated a SUmmarized RNA Folding (SURF) metric to provide a single measurement to predict the impact of secondary structure altering variants in human genetic studies.
Genome-wide association studies (GWAS) have identified more than 200 genomic loci for breast cancer risk, but specific causal genes in most of these loci have not been identified. In fact, transcriptome-wide association studies (TWAS) of breast cancer performed using gene expression prediction models trained in breast tissue have yet to clearly identify most target genes. To identify novel candidate genes, we performed a joint TWAS analysis that combined TWAS signals from multiple tissues. We used expression prediction models trained in 47 tissues from the Genotype-Tissue Expression data using a multivariate adaptive shrinkage method along with association summary statistics from the Breast Cancer Association Consortium and UK Biobank data. We identified 380 genes at 129 genomic loci to be significantly associated with breast cancer at the Bonferroni threshold (p < 2.36E-6). Of them, 29 genes were located in 11 novel regions that were at least 1Mb away from published GWAS hits. The rest of TWAS-significant genes were located in 118 known genomic loci from previous GWAS of breast cancer. After conditioning on previous GWAS index variants, we found that 22 genes located in known GWAS loci remained statistically significant. Our study maps potential target genes in more than half of known GWAS loci and discovers multiple new loci, providing new insights into breast cancer genetics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.