2003
DOI: 10.1021/ci034160g
|View full text |Cite
|
Sign up to set email alerts
|

Random Forest:  A Classification and Regression Tool for Compound Classification and QSAR Modeling

Abstract: A new classification and regression tool, Random Forest, is introduced and investigated for predicting a compound's quantitative or categorical biological activity based on a quantitative description of the compound's molecular structure. Random Forest is an ensemble of unpruned classification or regression trees created by using bootstrap samples of the training data and random feature selection in tree induction. Prediction is made by aggregating (majority vote or averaging) the predictions of the ensemble. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

24
1,992
0
13

Year Published

2010
2010
2021
2021

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 2,926 publications
(2,029 citation statements)
references
References 27 publications
24
1,992
0
13
Order By: Relevance
“…Data from our OCT1 HTS experiment were used as a training set to construct a binary structure−activity relationship (SAR) model correlating molecular features of 1780 compounds from the Pharmacon library with their inhibitory activities, discretized into two classes: inhibitors and noninhibitors. The Random Forest (RF) algorithm 33 was employed to build an ensemble classifier (SAR-I). We evaluated the accuracy of the SAR-I model (i.e., the area under the receiver operating characteristic curve (auROC)) by 100 repeated cross-validation runs ( Figure 7A).…”
Section: ■ Resultsmentioning
confidence: 99%
“…Data from our OCT1 HTS experiment were used as a training set to construct a binary structure−activity relationship (SAR) model correlating molecular features of 1780 compounds from the Pharmacon library with their inhibitory activities, discretized into two classes: inhibitors and noninhibitors. The Random Forest (RF) algorithm 33 was employed to build an ensemble classifier (SAR-I). We evaluated the accuracy of the SAR-I model (i.e., the area under the receiver operating characteristic curve (auROC)) by 100 repeated cross-validation runs ( Figure 7A).…”
Section: ■ Resultsmentioning
confidence: 99%
“…More applications of random forests can be found in other different fields like quantitative structure-activity relationship modeling [42], nuclear magnetic resonance spectroscopy [31], or clinical decision supports in medicine in general [11].…”
Section: Some Other Related Applicationsmentioning
confidence: 99%
“…These steps are repeated until a defined number of trees are created. 107 This eventually leads to a forest of regression trees.…”
Section: Qspr Modelsmentioning
confidence: 99%