Protein-coding genetic variants are the first considered in most studies and Precision Medicine workflows, but their interpretation is primarily driven by DNA sequence-based analytical tools and annotations. Thus, more specific and mechanistic interpretations should be attainable by integrating DNA-based scores with scores from the protein 3D structure. However, reliable and reproducible standardization of methods that use 3D structure for genomic variation is still lacking. Further, we believe that the current paradigm of aiming to directly predict the pathogenicity of variants skips the critical step of inferring, with precision, molecular mechanisms of dysfunction. Thus, we report herein the development and evaluation of single and composite 3D structurebased scores and their integration with protein and DNA sequence-based scores to better understand not only if a genomic variant alters a protein, but how. We believe this is a critical step for understanding mechanistic changes due to genomic variants, designing functional validation tests, and for improving disease classifications. We applied this approach to the RAS gene family encoding seven distinct proteins and their 935 unique missense variants present somatically in cancer, in rare diseases (termed RASopathies), and in the currently healthy adult population. This knowledge shows that protein structure-based scores are distinct from information available from genomic annotation, that they are useful for interpreting genomic variants, and they should be taken into consideration in future guidelines for genomic data interpretation.
Significance StatementGenetic information from patients is a powerful data type for understanding individual differences in disease risk and treatment, but most of the genetic variation we observe has no mechanistic interpretation. This lack of interpretation limits the use of genomics data in clinical care. Standard methods for genomics data interpretation take advantage of annotations available for the human reference genome, but they do not consider the 3D protein molecule. We believe that changes to the 3D molecule must be considered, to augment current practice and lead to more precise interpretation. In this work, we present our initial process for systematic multi-level molecular scores, including 3D, to interrogate 935 RAS-family variants that are relevant in both cancer and rare diseases. \body