Although rare missense variants underlying a number of Mendelian diseases have been noted to cluster in specific regions of proteins, this information may be underutilized when evaluating the pathogenicity of a gene or variant. We introduce ClusterBurden and GAMs, two methods for rapid association testing and predictive modelling, respectively, that combine variant burden and amino-acid residue clustering, in casecontrol studies. We show that ClusterBurden increases statistical power to identify disease genes driven by missense variants, in simulated and experimental 34-gene panel for hypertrophic cardiomyopathy. We then demonstrate that GAMs can be used to apply the ACMG criteria PM1 and PP3 quantitatively, and resolve a wide range of pathogenicity potential amongst variants of uncertain significance. An R package is available for association testing using ClusterBurden, and a web application (Pathogenicity_by_Position) is available for missense variant risk prediction using GAMs for six sarcomeric genes. In conclusion, the inclusion of amino-acid residue positional information enhances the accuracy of gene and rare variant pathogenicity interpretation.
Author SummaryTwo statistical methods have been developed that utilize signal in the residue position of missense variants. The first is a rapid association method that tests the joint hypothesis of an excess of rare-variants and rarevariant clustering. The method, ClusterBurden, is powerful when rare-missense variants cluster in discrete pathogenic regions of the protein. It can be applied to exome-scans to discover novel Mendelian diseasegenes, that may not be identified by classic burden testing. The second method is a statistical model for rare-missense variant interpretation. It provides superior predictive performance compared to generic in silico predictors by training on our large case-control dataset. The method represents a data-driven quantitative approach to apply hotspot and in-silico prediction criteria from the ACMG variant interpretation guidelines.