Lin Sun scite author profile

Statistical-learning methods have been developed for facilitating the prediction of pharmacokinetic and toxicological properties of chemical agents. These methods employ a variety of molecular descriptors to characterize structural and physicochemical properties of molecules. Some of these descriptors are specifically designed for the study of a particular type of properties or agents, and their use for other properties or agents might generate noise and affect the prediction accuracy of a statistical learning system. This work examines to what extent the reduction of this noise can improve the prediction accuracy of a statistical learning system. A feature selection method, recursive feature elimination (RFE), is used to automatically select molecular descriptors for support vector machines (SVM) prediction of P-glycoprotein substrates (P-gp), human intestinal absorption of molecules (HIA), and agents that cause torsades de pointes (TdP), a rare but serious side effect. RFE significantly reduces the number of descriptors for each of these properties thereby increasing the computational speed for their classification. The SVM prediction accuracies of P-gp and HIA are substantially increased and that of TdP remains unchanged by RFE. These prediction accuracies are comparable to those of earlier studies derived from a selective set of descriptors. Our study suggests that molecular feature selection is useful for improving the speed and, in some cases, the accuracy of statistical learning methods for the prediction of pharmacokinetic and toxicological properties of chemical agents.

show abstract

Prediction of P-Glycoprotein Substrates by a Support Vector Machine Approach

Yang

Yap

Sun

et al. 2004

J. Chem. Inf. Comput. Sci.

158

147

View full text Add to dashboard Cite

P-glycoproteins (P-gp) actively transport a wide variety of chemicals out of cells and function as drug efflux pumps that mediate multidrug resistance and limit the efficacy of many drugs. Methods for facilitating early elimination of potential P-gp substrates are useful for facilitating new drug discovery. A computational ensemble pharmacophore model has recently been used for the prediction of P-gp substrates with a promising accuracy of 63%. It is desirable to extend the prediction range beyond compounds covered by the known pharmacophore models. For such a purpose, a machine learning method, support vector machine (SVM), was explored for the prediction of P-gp substrates. A set of 201 chemical compounds, including 116 substrates and 85 nonsubstrates of P-gp, was used to train and test a SVM classification system. This SVM system gave a prediction accuracy of at least 81.2% for P-gp substrates based on two different evaluation methods, which is substantially improved against that obtained from the multiple-pharmacophore model. The prediction accuracy for nonsubstrates of P-gp is 79.2% using 5-fold cross-validation. These accuracies are slightly better than those obtained from other statistical classification methods, including k-nearest neighbor (k-NN), probabilistic neural networks (PNN), and C4.5 decision tree, that use the same sets of data and molecular descriptors. Our study indicates the potential of SVM in facilitating the prediction of P-gp substrates.

show abstract

Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification

Sun

Zhang

Qian

et al. 2019

Information Sciences

216

View full text Add to dashboard Cite

Prediction of P‐Glycoprotein Substrates by a Support Vector Machine Approach.

et al. 2004

View full text Add to dashboard Cite

Feature selection using rough entropy-based uncertainty measures in incomplete decision systems

Sun

Tian

2012

Knowledge-Based Systems

125

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Lin Sun

Effect of Molecular Descriptor Feature Selection in Support Vector Machine Classification of Pharmacokinetic and Toxicological Properties of Chemical Agents

Prediction of P-Glycoprotein Substrates by a Support Vector Machine Approach

Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification

Prediction of P‐Glycoprotein Substrates by a Support Vector Machine Approach.

Feature selection using rough entropy-based uncertainty measures in incomplete decision systems

Contact Info

Product

Resources

About