2017
DOI: 10.1016/j.eswa.2016.12.008
|View full text |Cite
|
Sign up to set email alerts
|

Automatic selection of molecular descriptors using random forest: Application to drug discovery

Abstract: The optimal selection of chemical features (molecular descriptors) is an essential pre-processing step for the efficient application of computational intelligence techniques in virtual screening for identification of bioactive molecules in drug discovery. The selection of molecular descriptors has key influence in the accuracy of affinity prediction. In order to improve this prediction, we examined a Random Forest (RF)-based approach to automatically select molecular descriptors of training data for ligands of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
57
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
2
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 110 publications
(58 citation statements)
references
References 31 publications
1
57
0
Order By: Relevance
“…RFs have been employed in a wide variety of classification and prediction problems (Scornet et al, 2015;Cano et al, 2017) as they are among the most effective computationallyintensive algorithms to extract information from unstable estimates (Scornet et al, 2015). They are especially well suited for large, high-dimensional datasets, where problem complexity and scale render direct discovery of a good model in a single step impossible (Büchlmann and Yu, 2002;Kleiner et al, 2014;Wager et al, 2014).…”
Section: Methodology Random Forestsmentioning
confidence: 99%
See 1 more Smart Citation
“…RFs have been employed in a wide variety of classification and prediction problems (Scornet et al, 2015;Cano et al, 2017) as they are among the most effective computationallyintensive algorithms to extract information from unstable estimates (Scornet et al, 2015). They are especially well suited for large, high-dimensional datasets, where problem complexity and scale render direct discovery of a good model in a single step impossible (Büchlmann and Yu, 2002;Kleiner et al, 2014;Wager et al, 2014).…”
Section: Methodology Random Forestsmentioning
confidence: 99%
“…where c is the number of classes and p (i|t), p j|t are the estimated probabilities of classes i, j at node t (Cano et al, 2017). In this context, Mean Decrease Gini (MDG) aggregates the Gini gain over all splits and trees to assess the classifying capacity of a variable (Friedman et al, 2009) and is thus a metric of the homogeneity of nodes and leaves in the RF (Bluemke and Stepień, 2016).…”
Section: Evaluation Criteriamentioning
confidence: 99%
“…[26], Tse et al used genetic algorithms to automatically select parameters in the design of an optimal complex Morlet wavelet filter, which was applied to bearing fault detection. In the same way, the automatic selection of chemical features for the identification of bioactive molecules in drug discovery was presented by Cano et al [27].…”
Section: Figmentioning
confidence: 98%
“…For example, ECFPs 20 are one of the most widely used fingerprints. To develop DTI prediction methods, a diverse set of ML techniques are employed (together with the feature vectors generated using abovementioned descriptors) such as random forest (RF) 21,22 , support vectors machines (SVM) 22,23 , logistic regression (LR) 24 .…”
Section: Introductionmentioning
confidence: 99%