Philip Kegelmeyer scite author profile

Philip Kegelmeyer

3Publications

38Citation Statements Received

65Citation Statements Given

How they've been cited

How they cite others

Affiliations

Sandia National Laboratories, Sandia National Laboratories California

Publications

Order By: Most citations

PostDOCK: A Structural, Empirical Approach to Scoring Protein Ligand Complexes

et al. 2005

View full text Add to dashboard Cite

In this work we introduce a postprocessing filter (PostDOCK) that distinguishes true binding ligand-protein complexes from docking artifacts (that are created by DOCK 4.0.1). PostDOCK is a pattern recognition system that relies on (1) a database of complexes, (2) biochemical descriptors of those complexes, and (3) machine learning tools. We use the protein databank (PDB) as the structural database of complexes and create diverse training and validation sets from it based on the "families of structurally similar proteins" (FSSP) hierarchy. For the biochemical descriptors, we consider terms from the DOCK score, empirical scoring, and buried solvent accessible surface area. For the machine-learners, we use a random forest classifier and logistic regression. Our results were obtained on a test set of 44 structurally diverse protein targets. Our highest performing descriptor combinations obtained approximately 19-fold enrichment (39 of 44 binding complexes were correctly identified, while only allowing 2 of 44 decoy complexes), and our best overall accuracy was 92%.

show abstract

Bagging is a small-data-set phenomenon

Chawla

Moore

Bowyer

et al.

View full text Add to dashboard Cite

Network and Ensemble Enabled Entity Extraction in Informal Text (NEEEEIT) final report

Kegelmeyer

Shead

Dunlavy

2013

View full text Add to dashboard Cite

This SAND report summarizes the activities and outcomes of the Network and Ensemble Enabled Entity Extraction in Informal Text (NEEEEIT) LDRD project, which addressed improving the accuracy of conditional random fields for named entity recognition through the use of ensemble methods.Conditional random fields (CRFs) are powerful, flexible probabilistic graphical models often used in supervised machine learning prediction tasks associated with sequence data. Specifically, they are currently the best known option for named entity recognition (NER) in text. NER is the process of labeling words in sentences with semantic identifiers such as "person", "date", or "organization".Ensembles are a powerful statistical inference meta-method that can make most supervised machine learning methods more accurate, faster, or both. Ensemble methods are normally best suited to "unstable" classification methods with high variance error. CRFs applied to NER are very stable classifiers, and as such, would initially seem to be resistant to the benefits of ensembles.The NEEEEIT project nonetheless worked out how to generalize ensemble methods to CRFs, demonstrated that accuracy can indeed be improved by proper use of ensemble techniques, and generated a new CRF code, "pyCrust" and a surrounding application environment, "NEEEEIT", which implement those improvements.The summary practical advice that results from this work, then, is:• When making use of CRFs for label prediction tasks in machine learning, use the pyCrust CRF base classifier with NEEEEIT's bagging ensemble implementation. (If those codes are not available, then de-stablize your CRF code via every means available, and generate the bagged training sets by hand.)• If you have ample pre-processing computational time, do "forward feature selection" to find and remove counter-productive feature classes.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.