2015
DOI: 10.1002/prot.24775
|View full text |Cite
|
Sign up to set email alerts
|

Combining features in a graphical model to predict protein binding sites

Abstract: Large efforts have been made in classifying residues as binding sites in proteins using machine learning methods. The prediction task can be translated into the computational challenge of assigning each residue the label binding site or non-binding site. Observational data comes from various possibly highly correlated sources. It includes the structure of the protein but not the structure of the complex. The model class of conditional random fields (CRFs) has previously successfully been used for protein bindi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 35 publications
0
6
0
Order By: Relevance
“…Sequence-based methods that predict protein–ligand binding sites and their interacting ligand-binding site residues are those that use information from evolutionary conservation and/or sequence similarity of homologous proteins. These methods can be broadly categorised into methods that utilize machine learning (Multi-RELIEF [ 11 ], TargetS [ 12 ], LigandRF [ 13 ], and OMSL [ 14 ]), methods that utilize only position-specific scoring matrices or PSSMs (INTREPID [ 15 ], DISCERN [ 16 ], ConSurf [ 17 ], and ConFunc [ 18 ]) and graph-based methods such as Conditional Random Field (CRF) [ 19 ]. The advent of including machine learning-based strategies into sequence-based methods has resulted in improved method sensitivity.…”
Section: In Silico Methods For the Prediction Omentioning
confidence: 99%
“…Sequence-based methods that predict protein–ligand binding sites and their interacting ligand-binding site residues are those that use information from evolutionary conservation and/or sequence similarity of homologous proteins. These methods can be broadly categorised into methods that utilize machine learning (Multi-RELIEF [ 11 ], TargetS [ 12 ], LigandRF [ 13 ], and OMSL [ 14 ]), methods that utilize only position-specific scoring matrices or PSSMs (INTREPID [ 15 ], DISCERN [ 16 ], ConSurf [ 17 ], and ConFunc [ 18 ]) and graph-based methods such as Conditional Random Field (CRF) [ 19 ]. The advent of including machine learning-based strategies into sequence-based methods has resulted in improved method sensitivity.…”
Section: In Silico Methods For the Prediction Omentioning
confidence: 99%
“…With such a large number of variables, the decomposition by the chain rule factorization is predominantly advantageous in terms of the number of operations saved. The popularity of PGI is largely due to the emergence of big data in diverse disciplines, from medicine to economics to social networks [22]- [24].…”
Section: Background and Motivationmentioning
confidence: 99%
“…A number of descriptors have been utilized for the purpose of PPI identification, such as hydrophobicity [ 5 ], energy of solvatation [ 6 ], propensity [ 5 ] or RASA (Relative Solvent Accessible Surface Area) [ 3 6 ], with RASA being especially popular [ 7 ]. As for machine learning approaches, the best performing methods utilize Support Vector Machines (SVM) [ 3 , 5 ], Neural networks [ 8 ], Decision trees [ 6 ] or Conditional Random Fields (CRF) [ 9 , 10 ].…”
Section: Introductionmentioning
confidence: 99%
“…The goal is to find the most probable labeling of hidden variables according to observations. Our approach was inspired by the CRF-based method presented by Dong et al [ 9 ] and Wierschin et al [ 10 ] where a protein is represented in a graph. In that representation, every amino acid corresponds to a node, and two nodes are connected by an edge if their corresponding amino acids are sufficiently close to each other.…”
Section: Introductionmentioning
confidence: 99%