Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning

Zhao, Nan; Han, Jilong; Shyu, Chi‐Ren; Korkin, Dmitry

doi:10.1371/journal.pcbi.1003592

Cited by 76 publications

(67 citation statements)

References 82 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A recent study used SKEMPI mutants to train a classifier for nsSNPs that affect protein-interactions, using three classes -no effect, diminished binding, and enhanced binding [51]. However, the classifier did not define a class of "non-binders", as in the present study.…”

Section: Classes Of Altered Bindingmentioning

confidence: 72%

“…Such a classifier could aid in the development of more sophisticated free energy (∆G) scoring functions. There is preliminary evidence that disease-causing nsSNPs that alter protein interactions act through distinct mechanisms [51]. The functional insight that future tools such as the one in the present study might shed on interaction-altering human SNPs would prove invaluable to the current understanding of human genetic variation in disease.…”

Section: Classes Of Altered Bindingmentioning

confidence: 73%

See 1 more Smart Citation

Docking features for predicting binding loss due to protein mutation

Goodacre

Edwards

Danielsen

et al. 2014

Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

View full text Add to dashboard Cite

The human genome contains a large number of protein polymorphisms due to individual genome variation. How many of these polymorphisms lead to altered protein-protein interaction is unknown. We have developed a method that uses docking simulations to predict whether variants have altered interactions with their binding partners. A novel docking score normalization that compares the docking of mutant-containing protein pairs to that of the wild-type pair is introduced. Using the SKEMPI database and CAPRI, a training set of 167 mutant pairs (87 binders, 80 non-binders) were identified and docked using the docking program, HADDOCK. A random forest classifier that uses the differences in resulting docking scores for the 167 mutant pairs, to distinguish between variants that have either completely or partially lost binding ability, was used. 50% of non-binders were correctly predicted with a false discovery rate of only 2%. This allows for the rapid identification of a large number of protein polymorphisms that are likely to have a physiological consequence. The model was tested on a set of 15 HIV-1 -human, as well as 7 human -human glioblastoma-related, mutant proteins pairs: 50% of combined non-binders were correctly predicted with a false discovery rate of 10%.

show abstract

Section: Classes Of Altered Bindingmentioning

confidence: 72%

Section: Classes Of Altered Bindingmentioning

confidence: 73%

Docking features for predicting binding loss due to protein mutation

Goodacre

Edwards

Danielsen

et al. 2014

Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

View full text Add to dashboard Cite

show abstract

“…Machine-learning-based scoring functions uses a variety of mostly supervised machine-learning algorithms [158,159], such as artificial neural networks [160], random forest [161–163], and support vector machine [164], to learn about a specific energetic or other structural or biological properties using a training set of protein structures. The resulting, trained, machine-learning-based function, can then be used to produce a scoring value associated with a predicted property:

Input : Descriptors \to Trained Scoring Function \to Output : Scoring Value

…”

Section: Challenges In Automated Protein Designmentioning

confidence: 99%

Recent advances in automated protein design and its future challenges

Setiawan

Brender

Zhang

2018

Expert Opinion on Drug Discovery

View full text Add to dashboard Cite

Introduction Protein function is determined by protein structure which is in turn determined by the corresponding protein sequence. If the rules that cause a protein to adopt a particular structure are understood, it should be possible to refine or even redefine the function of a protein by working backwards from the desired structure to the sequence. Automated protein design attempts to calculate the effects of mutations computationally with the goal of more radical or complex transformations than are accessible by experimental techniques. Areas covered The authors give a brief overview of the recent methodological advances in computer-aided protein design, showing how methodological choices affect final design and how automated protein design can be used to address problems considered beyond traditional protein engineering, including the creation of novel protein scaffolds for drug development. Also, the authors address specifically the future challenges in the development of automated protein design. Expert opinion Automated protein design holds potential as a protein engineering technique, particularly in cases where screening by combinatorial mutagenesis is problematic. Considering solubility and immunogenicity issues, automated protein design is initially more likely to make an impact as a research tool for exploring basic biology in drug discovery than in the design of protein biologics.

show abstract

“…A prominent example is the sickle-cell disease [2]. However, even though many non-synonymous SNPs are known, for the majority of them the corresponding structural change is still un-known [3]. In personalized medicine, whole exome sequencing can lead to the detection of several thousand SNPs per sample.…”

mentioning

confidence: 99%

SNPViz - Visualization of SNPs in proteins

Seitz

Koch

Nieselt

2017

Genomics Comput Biol

View full text Add to dashboard Cite

SUMMARYIn personalized medicine, SNPs are used to identify specific diseases of a patient. However, for many SNPs, no information about the pathogenicity is available. Current programs try to predict the effect of a SNP on the function of a protein, but give no possibility for visual interpretation. We have developed SNPViz, a program that first finds 3D structures of affected proteins and then highlights the affected amino acid in the 3D structure. This can give researchers and doctors more information about the probable pathogenicity of the SNP. In the future, we plan to add also further information, such as whether the position of the SNP is in a binding domain, or is involved in a protein-protein interaction. KEYWORDSSNP; Visualization; Protein; Personalized medicine AVAILABILITY AND REQUIREMENTSThe git repository containing the program is available at https://lambda.informatik.uni-tuebingen.de/ gitlab/seitz/snpviz CONFLICT OF INTERESTThe authors declare no conflict of interest. BODYSingle nucleotide polymorphisms (SNPs) are the most common genetic variations between humans and many are believed to be causative for phenotypic differences [1].Non-synonymous (ns) SNPs, meaning SNPs that result in a substitution of the amino acid in the corresponding protein, are known to be the possible cause of structural changes. A prominent example is the sickle-cell disease [2]. However, even though many non-synonymous SNPs are known, for the majority of them the corresponding structural change is still un-known [3]. In personalized medicine, whole exome sequencing can lead to the detection of several thousand SNPs per sample. However, which of them could be responsible for the cause of the disease remains unclear [4].One tool available to predict the impact of a SNP on the protein function is SIFT [5]. Our idea is to look at the putative structural changes that could be the result of a mutation in a protein in order to gain insight into possible disease related SNPs. Also, researchers often want to see the protein and exact position of the amino acid subject to mutation. This to get her with an automated pipeline that first finds the respective protein in PDB, then identifies the amino acid(s) affected by the nsNSPs, and finally visualizes this result does not exist (to our knowledge). Here, we present a tool that can highlight affected positions in the 3D structure of the corresponding proteins.For this we developed the Java tool SNPViz. It can give insights regarding SNPs for whichno pathogenic effect is known. It first identifies the exons that are affected by SNPs of interest. These exons are then mapped to the corresponding proteins using the ID mapping of UniProt [6], a database containing multiple gene annotations like ENSEMBL [7], the Protein Data Bank (PDB) [8], and more. Afterwards, if existing, the corresponding 3D structures for identified proteins are downloaded from the PDB. The affected exons are then translated to all 6 possible amino acid sequences. Next, the position of the exon within the protein is ide...

show abstract

Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning

Cited by 76 publications

References 82 publications

Docking features for predicting binding loss due to protein mutation

Docking features for predicting binding loss due to protein mutation

Recent advances in automated protein design and its future challenges

SNPViz - Visualization of SNPs in proteins

Contact Info

Product

Resources

About