A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease

Smedley, Damian; Schubach, Max; Jacobsen, Julius O. B.; Köhler, Sebastian; Żemojtel, Tomasz; Spielmann, Malte; Jäger, Marten; Hochheiser, Harry; Washington, Nicole; McMurry, Julie A.; Haendel, Melissa; Mungall, Christopher J.; Lewis, Suzanna; Groza, Tudor; Valentini, Giorgio; Robinson, Peter N.

doi:10.1016/j.ajhg.2016.07.005

Cited by 245 publications

(274 citation statements)

References 74 publications

Supporting

Mentioning

269

Contrasting

Order By: Relevance

“…Scores for all possible SNVs genome-wide were downloaded from CADD 24 (http://cadd.gs.washington.edu/download), Genomiser 41 (https://charite.github.io/software-remm-score.html#download), and fathmm-MKL 42 (https://github.com/HAShihab/fathmm-MKL). …”

Section: Methodsmentioning

confidence: 99%

De novo mutations in regulatory elements in neurodevelopmental disorders

Short

McRae

Gallone

et al. 2018

Nature

245

219

View full text Add to dashboard Cite

We previously estimated that 42% of patients with severe developmental disorders carry pathogenic de novo mutations in coding sequences. The role of de novo mutations in regulatory elements affecting genes associated with developmental disorders, or other genes, has been essentially unexplored. We identified de novo mutations in three classes of putative regulatory elements in almost 8,000 patients with developmental disorders. Here we show that de novo mutations in highly evolutionarily conserved fetal brain-active elements are significantly and specifically enriched in neurodevelopmental disorders. We identified a significant twofold enrichment of recurrently mutated elements. We estimate that, genome-wide, 1-3% of patients without a diagnostic coding variant carry pathogenic de novo mutations in fetal brain-active regulatory elements and that only 0.15% of all possible mutations within highly conserved fetal brain-active elements cause neurodevelopmental disorders with a dominant mechanism. Our findings represent a robust estimate of the contribution of de novo mutations in regulatory elements to this genetically heterogeneous set of disorders, and emphasize the importance of combining functional and evolutionary evidence to identify regulatory causes of genetic disorders.

show abstract

Section: Methodsmentioning

confidence: 99%

De novo mutations in regulatory elements in neurodevelopmental disorders

Short

McRae

Gallone

et al. 2018

Nature

245

219

View full text Add to dashboard Cite

show abstract

“…We subdivided the Mendelian data that include 406 manually annotated "positive" deleterious SNV and more than 14 millions of neutral "negative" SNVs in a training set including about 9/10 of the available data and a separated test set including the remaining 1/10 of data, using the same set of genomic features described in [Smedley et al, 2016]. We then compared the hyperSMURF results obtained by using the default parameters (i.e.…”

Section: Resultsmentioning

confidence: 99%

“…Indeed in this context classical machine learning methods are biased toward neutral variants that constitute the large majority of genetic variation, and are not able to detect the potential deleterious variants that constitute only a tiny minority of all known genetic variation [Smedley et al, 2016].…”

Section: Introductionmentioning

confidence: 99%

Parameters tuning boosts hyperSMURF predictions of rare deleterious non-coding genetic variants

Petrini

Schubach

Ré

et al. 2017

Preprint

Self Cite

View full text Add to dashboard Cite

The regulatory code that determines whether and how a given genetic variant affects the function of a regulatory element remains poorly understood for most classes of regulatory variation. Indeed the large majority of bioinformatics tools have been developed to predict the pathogenicity of genetic variants in coding sequences or conserved splice sites.Computational algorithms for the prediction of non-coding deleterious variants associated with rare genetic diseases are faced with special challenges owing to the rarity of confirmed pathogenic mutations. Indeed in this context classical machine learning methods are biased toward neutral variants that constitute the large majority of genetic variation, and are not able to detect the potential deleterious variants that constitute only a tiny minority of all known genetic variation. We recently proposed hyperSMURF, hyperensemble of SMOTE Undersampled Random Forests, an ensemble approach explicitly designed to deal with the huge imbalance between deleterious and neutral variants, and able to significantly outperform state-of-the-art methods for the prediction of non-coding variants associated with Mendelian diseases. Despite its successful application to the detection of deleterious single nucleotide variants (SNV) as well as to small insertions or deletions (indels), hyperSMURF is a method that depends on several learning parameters, that strongly influence its overall performances. In this work we experimentally show that by tuning hyperSMURF parameters we can significantly boost the performance of the method, thus predicting with significantly better precision and recall rare SNVs associated with Mendelian diseases.PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.3185v1 | CC BY 4.0 Open Access |

show abstract

“…However, it should be noted that non-coding variant prioritization tools are less accurate than their protein-coding counterparts. Although many new approaches are being developed 39,94,95 , there is simply insufficient understanding of the regulatory machinery encrypted in non-coding DNA to prioritize non-coding variants with similar accuracy to that of coding variants 94 .…”

Section: Current Challenges and Emerging Solutionsmentioning

confidence: 99%

Settling the score: variant prioritization and Mendelian disease

2017

View full text Add to dashboard Cite

When investigating Mendelian disease using exome or genome sequencing, distinguishing disease-causing genetic variants from the multitude of candidate variants is a complex, multidimensional task. Many prioritization tools and online interpretation resources exist, and professional organizations have offered clinical guidelines for review and return of prioritization results. In this Review, we describe the strengths and weaknesses of widely used computational approaches, explain their roles in the diagnostic and discovery process and discuss how they can inform (and misinform) expert reviewers. We place variant prioritization in the wider context of gene prioritization, burden testing and genotype–phenotype association, and we discuss opportunities and challenges introduced by whole-genome sequencing.

show abstract

A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease

Cited by 245 publications

References 74 publications

De novo mutations in regulatory elements in neurodevelopmental disorders

De novo mutations in regulatory elements in neurodevelopmental disorders

Parameters tuning boosts hyperSMURF predictions of rare deleterious non-coding genetic variants

Settling the score: variant prioritization and Mendelian disease

Contact Info

Product

Resources

About