2011
DOI: 10.1002/minf.201100069
|View full text |Cite
|
Sign up to set email alerts
|

Selecting Relevant Descriptors for Classification by Bayesian Estimates: A Comparison with Decision Trees and Support Vector Machines Approaches for Disparate Data Sets

Abstract: Classification algorithms suffer from the curse of dimensionality, which leads to overfitting, particularly if the problem is over-determined. Therefore it is of particular interest to identify the most relevant descriptors to reduce the complexity. We applied Bayesian estimates to model the probability distribution of descriptors values used for binary classification using n-fold cross-validation. As a measure for the discriminative power of the classifiers, the symmetric form of the Kullback-Leibler divergen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
32
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 35 publications
(34 citation statements)
references
References 31 publications
2
32
0
Order By: Relevance
“…55) were built on severely unbalanced training sets and tested on clearly unbalanced external sets. As demonstrated by Carbon-Mangels et al 56. the relevance of machine-learning classification methods, and especially SVM, are negatively impacted by datasets with one significantly more populated class.…”
Section: One-panel-per-molecule Outputmentioning
confidence: 99%
“…55) were built on severely unbalanced training sets and tested on clearly unbalanced external sets. As demonstrated by Carbon-Mangels et al 56. the relevance of machine-learning classification methods, and especially SVM, are negatively impacted by datasets with one significantly more populated class.…”
Section: One-panel-per-molecule Outputmentioning
confidence: 99%
“…Non‐inhibition or ability to be substrate is labeled as (‐); [d] Renal organic cation transporter (OCT2) inhibition . Non‐inhibition is labeled as (‐); [e] Ability to be substrate or inhibit Cytochrome P450 isoenzymes (CYP450 2 C9, CYP450 2D6, CYP450 3 A4, CYP450 1 A2, and CYP 2 C19) . It is shown the predicted isoform‐CYP that the compound potentially inhibits; [f] Human ether‐a‐go‐go‐related gene (hERG) inhibition .…”
Section: Resultsmentioning
confidence: 99%
“…[28] Non-inhibition is labeled as (-); [e] Ability to be substrate or inhibit Cytochrome P450 isoenzymes (CYP450 2 C9, CYP450 2D6, CYP450 3 A4, CYP450 1 A2, and CYP 2 C19). [29] It is shown the predicted isoform-CYP that the compound potentially inhibits; [f] Human ether-a-go-go-related gene (hERG) inhibition. [30,31] Non-inhibition is labeled as (-); [g] In vitro mutagenicity according to Ames test.…”
Section: Drug-like Properties Predictionsmentioning
confidence: 99%
“…An ADME analysis includes the analysis of various properties such as ability to penetrate blood–brain barrier, 43 capability of human intestinal absorption, 43 Caco-2 permeability, 44 and abilities to function as a P-glycoprotein (P-gp) substrate 45 and inhibitor, 46 , 47 renal organic cation transporter, 48 and cytochrome P450 substrate 49 and inhibitor. 50 …”
Section: Resultsmentioning
confidence: 99%