This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
The increasing ease with which massive genetic information can be obtained from patients or healthy individuals has stimulated the development of interpretive bioinformatics tools as aids in clinical practice. Most such tools analyze evolutionary information and simple physical–chemical properties to predict whether replacement of one amino acid residue with another will be tolerated or cause disease. Those approaches achieve up to 80–85% accuracy as binary classifiers (neutral/pathogenic). As such accuracy is insufficient for medical decision to be based on, and it does not appear to be increasing, more precise methods, such as full-atom molecular dynamics (MD) simulations in explicit solvent, are also discussed. Then, to describe the goal of interpreting human genetic variations at large scale through MD simulations, we restrictively refer to all possible protein variants carrying single-amino-acid substitutions arising from single-nucleotide variations as the human variome. We calculate its size and develop a simple model that allows calculating the simulation time needed to have a 0.99 probability of observing unfolding events of any unstable variant. The knowledge of that time enables performing a binary classification of the variants (stable-potentially neutral/unstable-pathogenic). Our model indicates that the human variome cannot be simulated with present computing capabilities. However, if they continue to increase as per Moore’s law, it could be simulated (at 65°C) spending only 3 years in the task if we started in 2031. The simulation of individual protein variomes is achievable in short times starting at present. International coordination seems appropriate to embark upon massive MD simulations of protein variants.
Hypertrophic cardiomyopathy (HCM) is the most common inherited cardiac disease. Mutations in MYBPC3, the gene encoding cardiac myosin-binding protein C (cMyBP-C), are a leading cause of HCM. However, it remains challenging to define whether specific gene variants found in patients are pathogenic or not, limiting the reach of cardiovascular genetics in the management of HCM. Here, we have examined cMyBP-C haploinsufficiency drivers in 68 clinically annotated non-truncating variants of MYBPC3. We find that 45% of the pathogenic variants show alterations in RNA splicing or protein stability, which can be linked to pathogenicity with 100% and 94% specificity, respectively. Relevant for variant annotation, we uncover that 9% of non-truncating variants of MYBPC3 currently classified as of uncertain significance induce one of these molecular phenotypes. We propose that alteration of RNA splicing or protein stability caused by MYBPC3 variants provide strong evidence of their pathogenicity, leading to improved clinical management of HCM patients and their families
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.