High Precision Prediction of Functional Sites in Protein Structures

Buturović, Ljubomir; Wong, Mike; Tang, Gale L.; Altman, Russ B.; Petković, Dragutin

doi:10.1371/journal.pone.0091240

Cited by 15 publications

(25 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…FEATURE data i.e. the training database used for RF training, contains feature vectors at known positive (functional site) and negative (background) class labels for each protein functional model [18]. FEATURE training data is highly imbalanced e.g.…”

Section: Case Study: Rfex Applied To Stanford Feature Datamentioning

confidence: 99%

“…there are two to three orders of magnitude more negative (background) vs. positive (functional sites) samples. For the work in this paper we used the same 7 FEATURE models selected in experiments in [5], which are subset of models analyzed in [18], see Table 1.…”

Section: Case Study: Rfex Applied To Stanford Feature Datamentioning

confidence: 99%

“…Common problem with this approach is still a large number of complex rules hard to interpret by humans and lack of tradeoffs between accuracy and number of rules used. Our prior work on explainability for RF was motivated by our original joint work with Stanford Helix team on applying Support Vector Machines (SVM) [18] and RF [5] to their FEATURE data [19] where we show very good classification results measured by high recall and precision. In [5] we made first attempts to improve explainability by using RF-provided variable importance measures but did not analyze positive vs. negative classes separately and achieved very limited explainability improvements.…”

Section: Introduction Background and Motivationmentioning

confidence: 99%

See 2 more Smart Citations

Improving the explainability of Random Forest classifier – user centered approach

et al. 2017

Self Cite

View full text Add to dashboard Cite

Machine Learning (ML) methods are now influencing major decisions about patient care, new medical methods, drug development and their use and importance are rapidly increasing in all areas. However, these ML methods are inherently complex and often difficult to understand and explain resulting in barriers to their adoption and validation. Our work (RFEX) focuses on enhancing Random Forest (RF) classifier explainability by developing easy to interpret explainability summary reports from trained RF classifiers as a way to improve the explainability for (often non-expert) users. RFEX is implemented and extensively tested on Stanford FEATURE data where RF is tasked with predicting functional sites in 3D molecules based on their electrochemical signatures (features). In developing RFEX method we apply user-centered approach driven by explainability questions and requirements collected by discussions with interested practitioners. We performed formal usability testing with 13 expert and non-expert users to verify RFEX usefulness. Analysis of RFEX explainability report and user feedback indicates its usefulness in significantly increasing explainability and user confidence in RF classification on FEATURE data. Notably, RFEX summary reports easily reveal that one needs very few (from 2-6 depending on a model) top ranked features to achieve 90% or better of the accuracy when all 480 features are used.

show abstract

Section: Case Study: Rfex Applied To Stanford Feature Datamentioning

confidence: 99%

Section: Case Study: Rfex Applied To Stanford Feature Datamentioning

confidence: 99%

Section: Introduction Background and Motivationmentioning

confidence: 99%

See 1 more Smart Citation

Improving the explainability of Random Forest classifier – user centered approach

et al. 2017

Self Cite

View full text Add to dashboard Cite

show abstract

“…They have also proven useful for identifying possible metal-binding sites from structure alone (Bordner, 2008;Buturovic et al, 2014). Here, we trained SVMs on information from the X-ray scattering and local chemical environment.…”

Section: Discussionmentioning

confidence: 99%

“…In the context of structural biology, these methods have shown success in the analysis of crystallization images (Pan et al, 2006) as well as in the prediction of binding and functional sites from both sequence (Lippi et al, 2012;Carugo, 2008) and structure (Brylinski & Skolnick, 2011;Buturovic et al, 2014), structural polymorphism (Takaya et al, 2013), the results of mutation experiments (Wei et al, 2013) and model building into electron density (Holton et al, 2000;Gopal et al, 2007). Here, we present an advance upon our previous method, in which we use support vector machines (SVMs) to classify sites as either water or one of various elemental ions.…”

Section: Introductionmentioning

confidence: 99%

Using support vector machines to improve elemental ion identification in macromolecular crystal structures

Morshed

Adams

2015

Acta Cryst D Biol Crystallogr

View full text Add to dashboard Cite

show abstract

COLLAPSE: A representation learning framework for identification and characterization of protein structural sites

Derry

Altman

2023

Protein Science

Self Cite

View full text Add to dashboard Cite

The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to functionally annotate them. Existing methods for function prediction either do not operate on local sites, suffer from high false positive or false negative rates, or require large site‐specific training datasets, necessitating the development of new computational methods for annotating functional sites at scale. We present COLLAPSE (Compressed Latents Learned from Aligned Protein Structural Environments), a framework for learning deep representations of protein sites. COLLAPSE operates directly on the 3D positions of atoms surrounding a site and uses evolutionary relationships between homologous proteins as a self‐supervision signal, enabling learned embeddings to implicitly capture structure–function relationships within each site. Our representations generalize across disparate tasks in a transfer learning context, achieving state‐of‐the‐art performance on standardized benchmarks (protein–protein interactions and mutation stability) and on the prediction of functional sites from the Prosite database. We use COLLAPSE to search for similar sites across large protein datasets and to annotate proteins based on a database of known functional sites. These methods demonstrate that COLLAPSE is computationally efficient, tunable, and interpretable, providing a general‐purpose platform for computational protein analysis.

show abstract

High Precision Prediction of Functional Sites in Protein Structures

Cited by 15 publications

References 22 publications

Improving the explainability of Random Forest classifier – user centered approach

Improving the explainability of Random Forest classifier – user centered approach

Using support vector machines to improve elemental ion identification in macromolecular crystal structures

COLLAPSE: A representation learning framework for identification and characterization of protein structural sites

Contact Info

Product

Resources

About