2013
DOI: 10.1186/1758-2946-5-9
|View full text |Cite
|
Sign up to set email alerts
|

Random forests for feature selection in QSPR Models - an application for predicting standard enthalpy of formation of hydrocarbons

Abstract: BackgroundOne of the main topics in the development of quantitative structure-property relationship (QSPR) predictive models is the identification of the subset of variables that represent the structure of a molecule and which are predictors for a given property. There are several automated feature selection methods, ranging from backward, forward or stepwise procedures, to further elaborated methodologies such as evolutionary programming. The problem lies in selecting the minimum subset of descriptors that ca… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
62
1

Year Published

2016
2016
2023
2023

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 65 publications
(63 citation statements)
references
References 65 publications
0
62
1
Order By: Relevance
“…The method comes therefore to a non-linear consensus on unpruned decision trees. Two important variables (although the method is not very sensitive to their values) are the number of trees to grow in the forest (ntree) and the number of variables to choose at each node (mtry) [60,62]. Using RF there is a reduced risk of overfitting, since this approach uses a large number of simple models and includes the possibility to treat non-standard problems (number of descriptors higher than that of observations).…”
Section: Random Forestmentioning
confidence: 99%
“…The method comes therefore to a non-linear consensus on unpruned decision trees. Two important variables (although the method is not very sensitive to their values) are the number of trees to grow in the forest (ntree) and the number of variables to choose at each node (mtry) [60,62]. Using RF there is a reduced risk of overfitting, since this approach uses a large number of simple models and includes the possibility to treat non-standard problems (number of descriptors higher than that of observations).…”
Section: Random Forestmentioning
confidence: 99%
“…A popular approach in the literature is to apply tools from machine learning on certain DFT calculations to accelerate prediction of various properties of compounds [16][17][18][19][20][21][22] . Ideas from machine learning have been coupled with databases of ab initio calculations to estimate molecular electronic properties in chemical compound space, including the enthalpy of formation of compounds 23,24 . However, these methods still have the major disadvantage of requiring results from many DFT calculations, which may not be possible for alloys without given crystal structures, i.e., amorphous or noncrystalline alloys.…”
Section: Introductionmentioning
confidence: 99%
“…Random forest (RF) was used for feature selection. RF was a popular and efficient algorithm, based on model aggregation ideas, regardless of classification or regression problems37. RF was implemented by the component “Learn R Forest Model” in Pipeline Pilot 8.5, invoking the R package “RandomForest”.…”
Section: Methodsmentioning
confidence: 99%