Purpose
Prior studies have demonstrated the significance of specific
cis
-regulatory variants in retinal disease; however, determining the functional impact of regulatory variants remains a major challenge. In this study, we utilized a machine learning approach, trained on epigenomic data from the adult human retina, to systematically quantify the predicted impact of
cis
-regulatory variants.
Methods
We used human retinal DNA accessibility data (ATAC-seq) to determine a set of 18.9k high-confidence, putative
cis
-regulatory elements. Eighty percent of these elements were used to train a machine learning model utilizing a gapped
k
-mer support vector machine–based approach. In silico saturation mutagenesis and variant scoring was applied to predict the functional impact of all potential single nucleotide variants within
cis
-regulatory elements. Impact scores were tested in a 20% hold-out dataset and compared to allele population frequency, phylogenetic conservation, transcription factor (TF) binding motifs, and existing massively parallel reporter assay data.
Results
We generated a model that distinguishes between human retinal regulatory elements and negative test sequences with 95% accuracy. Among a hold-out test set of 3.7k human retinal CREs, all possible single nucleotide variants were scored. Variants with negative impact scores correlated with higher phylogenetic conservation of the reference allele, disruption of predicted TF binding motifs, and massively parallel reporter expression.
Conclusions
We demonstrated the utility of human retinal epigenomic data to train a machine learning model for the purpose of predicting the impact of non-coding regulatory sequence variants. Our model accurately scored sequences and predicted putative transcription factor binding motifs. This approach has the potential to expedite the characterization of pathogenic non-coding sequence variants in the context of unexplained retinal disease.
Translational Relevance
This workflow and resulting dataset serve as a promising genomic tool to facilitate the clinical prioritization of functionally disruptive non-coding mutations in the retina.
The complex genomic landscape of prostate cancer evolves across disease states under therapeutic pressure directed toward inhibiting androgen receptor (AR) signaling. While significantly altered genes in prostate cancer have been extensively defined, there have been fewer systematic analyses of how structural variation shapes the genomic landscape of this disease across disease states. We uniformly characterized structural alterations across 531 localized and 143 metastatic prostate cancers profiled by whole genome sequencing, 125 metastatic samples of which were also profiled via whole transcriptome sequencing. We observed distinct significantly recurrent breakpoints in localized and metastatic castration-resistant prostate cancers (mCRPC), with pervasive alterations in noncoding regions flanking the AR, MYC, FOXA1, and LSAMP genes enriched in mCRPC and TMPRSS2-ERG rearrangements enriched in localized prostate cancer. We defined nine subclasses of mCRPC based on signatures of structural variation, each associated with distinct genetic features and clinical outcomes. Our results comprehensively define patterns of structural variation in prostate cancer and identify clinically actionable subgroups based on whole genome profiling.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.