This work reports the classification study conducted on the biggest COX-2 inhibitor data set so far. Using 2925 diverse COX-2 inhibitors collected from 168 pieces of literature, we applied machine learning methods, support vector machine (SVM) and random forest (RF), to develop 12 classification models. The best SVM and RF models resulted in MCC values of 0.73 and 0.72, respectively. The 2925 COX-2 inhibitors were reduced to a data set of 1630 molecules by removing intermediately active inhibitors, and 12 new classification models were constructed, yielding MCC values above 0.72. The best MCC value of the external test set was predicted to be 0.68 by the RF model using ECFP_4 fingerprints. Moreover, the 2925 COX-2 inhibitors were clustered into eight subsets, and the structural features of each subset were investigated. We identified substructures important for activity including halogen, carboxyl, sulfonamide, and methanesulfonyl groups, as well as the aromatic nitrogen atoms. The models developed in this study could serve as useful tools for compound screening prior to lab tests.
Inflammatory diseases can be treated by inhibiting 5-lipo-oxygenase activating protein (FLAP). In this study, a data set containing 2,112 FLAP inhibitors was collected. A total of 25 classification models were built by five machine learning algorithms with five different types of fingerprints. The best model, which was built by support vector machine algorithm with ECFP_4 fingerprint had an accuracy and a Matthews correlation coefficient of 0.862 and 0.722 on the test set, respectively. The predicted results were further evaluated by the application domain d STD-PRO (a distance between one compound to models). Each compound had a d STD-PRO value, which was calculated by the predicted probabilities obtained from all 25 models. The application domain results suggested that the reliability of predicted results depended mainly on the compounds themselves rather than algorithms or fingerprints. A group of customized 10-bit fingerprint was manually defined for clustering the molecular structures of 2,112 FLAP inhibitors into eight subsets by K-Means. According to the clustering results, most of inhibitors in two subsets (subsets 2 and 4) were highly active inhibitors. We found that aryl oxadiazole/oxazole alkanes, biaryl amino-heteroarenes, two aromatic rings (often N-containing) linked by a cyclobutene group, and 1,2,4-triazole group were typical fragments in highly active inhibitors. K E Y W O R D S 5-lipo-oxygenase activating protein inhibitor, classification model, machine learning, structure clustering, support vector machine 932 | TU eT al.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.